Usama wahab Khan
MVP,MCT, CTO @Evolution Technologies
Usama Wahab Khan
Father, data Scientist, Developer/Nerd, Traveler
Twitter : @usamawahabkhan
LinkedIn : Usamawahabkhan
What to use
for Data
Azure Synapse
Analytics
Module 05
• When you require an integrated relational and big data store
• When you need to manage data warehouse and analytical workloads
• When you need low cost storage
• When you require the ability to pause and restart the compute
• When you require a solution that can scale elastically
Azure
Stream
Analytics
Module 06
• When you require a fully managed event processing engine
• When you require temporal analysis of streaming data
• Support for analyzing IoT streaming data
• Support for analyzing application data through Event Hubs
• Ease of use with a Stream Analytics Query Language
Azure Data
Factory
Module 07
• When you want to orchestrate the batch movement of data
• When you want to connect to wide range of data platforms
• When you want to transform or enrich the data in movement
• When you want to integrate with SSIS packages
• Enables verbose logging of data processing activities
Azure
HDInsight
• When you need a low cost, high throughput data store
• When you need to store No-SQL data
• Provides a Hadoop Platform as a Service approach
• Suits acting as a Hadoop, Hbase, Storm or Kafka data store
• Eases the deployment and management of clusters
Azure Data
Catalog
• When you require documentation of your data stores
• When you require a multi user approach to documentation
• When you need to annotate data sources with descriptive metadata
• A fully managed cloud service whose users can discover the data sources
• When you require a solution that can help business users understand their data
What to use
for Data
Storage
Account
Module 02
• When you need a low cost, high throughput data store
• When you need to store No-SQL data
• When you do not need to query the data directly. No ad hoc query support
• Suits the storage of archive or relatively static data
• Suits acting as a HDInsight Hadoop data store
Data Lake
Store
Module 02
• When you need a low cost, high throughput data store
• Unlimited storage for No-SQL data
• When you do not need to query the data directly. No ad hoc query support
• Suits the storage of archive or relatively static data
• Suits acting as a Databricks , HDInsight and IoT data store
Azure
Databricks
Module 03
• Eases the deployment of a Spark based cluster
• Enables the fastest processing of Machine Learning solutions
• Enables collaboration between data engineers and data scientists
• Provides tight enterprise security integration with Azure Active Directory
• Integration with other Azure Services and Power BI
Azure
CosmosDB
Module 04
• Provides global distribution for both structured and unstructured data stores
• Millisecond query response time
• 99.999% availability of data
• Worldwide elastic scale of both the storage and throughput
• Multiple consistency levels to control data integrity with concurrency
Azure SQL
Database
Module 05
• When you require a relational data store
• When you need to manage transactional workloads
• When you need to manage a high volume on inserts and reads
• When you need a service that requires high concurrency
• When you require a solution that can scale elastically
Key component of a big data solution
Data warehousing is a key component of a cloud-based, end-to-end big data solution.
Azure Synapse Analytics
What is Azure Synapse Analytics?
Azure Synapse Analytics is a limitless analytics service, that brings together enterprise data warehousing and
Big Data analytics. It gives you the freedom to query data on your terms, using either serverless on-demand
or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience
to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs.
Synapse Analytics
Azure Synapse Analytics
Integrated data platform for BI, AI and continuous intelligence
Platform
Azure
Data Lake Storage
Common Data Model
Enterprise Security
Optimized for Analytics
Data lake integrated and
Common Data Model aware
METASTORE
SECURITY
MANAGEMENT
MONITORING
Integrated platform services
for, management, security,
monitoring, and metastore
DATA INTEGRATION
SQL
Analytics Runtimes
Integrated analytics runtimes
available provisioned and
serverless on-demand
SQL Analytics offering T-SQL for
batch, streaming and interactive
processing
Spark for big data processing with
Python, Scala, R and .NET
PROVISIONED ON-DEMAND
Form Factors
SQL
Languages
Python .NET Java Scala R
Multiple languages suited to
different analytics workloads
Experience Synapse Analytics Studio
SaaS developer experiences for
code free and code first
Artificial Intelligence / Machine Learning / Internet of Things
Intelligent Apps / Business Intelligence
Designed for analytics workloads
at any scale
Key Service Capabilities
Key Service Capabilities
Types of solution workloads
The modern data warehouse extends the scope of the data warehouse to serve Big Data that’s
prepared with techniques beyond relational ETL
Modern data warehousing
We want to integrate all our
data—including Big Data—
with our data warehouse
Advanced analytics
We’re trying to predict when
our customers churn
Real-time analytics
We’re trying to get insights
from our devices in real-time
Azure Synapse Link for Azure Cosmos
Synapse SQL architecture components
Dedicated SQL ( (formerly SQL DW) l) pool in Azure Synapse Analytics
Azure Synapse Analytics is
an analytics service that
brings together enterprise
data warehousing and Big
Data analytics. Dedicated
SQL pool (formerly SQL
DW) refers to the
enterprise data
warehousing features that
are available in Azure
Synapse Analytics.
Massively Parallel Processing (MPP) concepts
Control Node
Compute Node
0110101010101010101
1010101110101010101
Compute Node
0110101010101010101
1010101110101010101
Compute Node
0110101010101010101
1010101110101010101
Compute Node
0110101010101010101
1010101110101010101
Compute Node
0110101010101010101
1010101110101010101
Compute Node
0110101010101010101
1010101110101010101
Table geometries
Table distribution
Round Robin Tables Hash Distributed Tables Replicated Tables
How PolyBase works
The MPP engine’s integration method with PolyBase
MPP DWH Engine
Control Node
Compute Node
DMS
Compute Node
DMS
Azure Blob/Data Lake
Name Node
Data
Node
Data
Node
Data
Node
Data
Node
Data
Node
Data
Node
Data
Node
Data
Node
Perform Azure Synapse Analytics queries
SELECT Query Basics
SELECT <select_list>
[FROM <optional_from_specification>]
[WHERE <optional_filter_condition>]
[ORDER BY <optional_sort_specification>]
[JOIN <optional_join_specification>]
Examples
SELECT *
FROM Products p WHERE p.id ="1“
SELECT p.id, p.manufacturer, p.description
FROM Products p WHERE p.id ="1“
SELECT p.price, p.description, p.productId
FROM Products p ORDER BY p.price ASC
SELECT p.productId
FROM Products p JOIN p.shipping
Azure Synapse Studio
Data Hub - Studio
StorageAccounts
Databases
Dataset
More…
Develop Hub - Studio
SQLScript
Notebooks
Dataflow
Pipelines
Gallery
More…
Demo
Q & A
Usama Wahab Khan
Twitter : @usamawahabkhan
LinkedIn : Usamawahabkhan
Thank you 

Azure synapse by usama whaba khan

  • 1.
    Usama wahab Khan MVP,MCT,CTO @Evolution Technologies
  • 2.
    Usama Wahab Khan Father,data Scientist, Developer/Nerd, Traveler Twitter : @usamawahabkhan LinkedIn : Usamawahabkhan
  • 4.
    What to use forData Azure Synapse Analytics Module 05 • When you require an integrated relational and big data store • When you need to manage data warehouse and analytical workloads • When you need low cost storage • When you require the ability to pause and restart the compute • When you require a solution that can scale elastically Azure Stream Analytics Module 06 • When you require a fully managed event processing engine • When you require temporal analysis of streaming data • Support for analyzing IoT streaming data • Support for analyzing application data through Event Hubs • Ease of use with a Stream Analytics Query Language Azure Data Factory Module 07 • When you want to orchestrate the batch movement of data • When you want to connect to wide range of data platforms • When you want to transform or enrich the data in movement • When you want to integrate with SSIS packages • Enables verbose logging of data processing activities Azure HDInsight • When you need a low cost, high throughput data store • When you need to store No-SQL data • Provides a Hadoop Platform as a Service approach • Suits acting as a Hadoop, Hbase, Storm or Kafka data store • Eases the deployment and management of clusters Azure Data Catalog • When you require documentation of your data stores • When you require a multi user approach to documentation • When you need to annotate data sources with descriptive metadata • A fully managed cloud service whose users can discover the data sources • When you require a solution that can help business users understand their data
  • 5.
    What to use forData Storage Account Module 02 • When you need a low cost, high throughput data store • When you need to store No-SQL data • When you do not need to query the data directly. No ad hoc query support • Suits the storage of archive or relatively static data • Suits acting as a HDInsight Hadoop data store Data Lake Store Module 02 • When you need a low cost, high throughput data store • Unlimited storage for No-SQL data • When you do not need to query the data directly. No ad hoc query support • Suits the storage of archive or relatively static data • Suits acting as a Databricks , HDInsight and IoT data store Azure Databricks Module 03 • Eases the deployment of a Spark based cluster • Enables the fastest processing of Machine Learning solutions • Enables collaboration between data engineers and data scientists • Provides tight enterprise security integration with Azure Active Directory • Integration with other Azure Services and Power BI Azure CosmosDB Module 04 • Provides global distribution for both structured and unstructured data stores • Millisecond query response time • 99.999% availability of data • Worldwide elastic scale of both the storage and throughput • Multiple consistency levels to control data integrity with concurrency Azure SQL Database Module 05 • When you require a relational data store • When you need to manage transactional workloads • When you need to manage a high volume on inserts and reads • When you need a service that requires high concurrency • When you require a solution that can scale elastically
  • 6.
    Key component ofa big data solution Data warehousing is a key component of a cloud-based, end-to-end big data solution.
  • 7.
    Azure Synapse Analytics Whatis Azure Synapse Analytics? Azure Synapse Analytics is a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs.
  • 8.
    Synapse Analytics Azure SynapseAnalytics Integrated data platform for BI, AI and continuous intelligence Platform Azure Data Lake Storage Common Data Model Enterprise Security Optimized for Analytics Data lake integrated and Common Data Model aware METASTORE SECURITY MANAGEMENT MONITORING Integrated platform services for, management, security, monitoring, and metastore DATA INTEGRATION SQL Analytics Runtimes Integrated analytics runtimes available provisioned and serverless on-demand SQL Analytics offering T-SQL for batch, streaming and interactive processing Spark for big data processing with Python, Scala, R and .NET PROVISIONED ON-DEMAND Form Factors SQL Languages Python .NET Java Scala R Multiple languages suited to different analytics workloads Experience Synapse Analytics Studio SaaS developer experiences for code free and code first Artificial Intelligence / Machine Learning / Internet of Things Intelligent Apps / Business Intelligence Designed for analytics workloads at any scale
  • 9.
  • 10.
  • 11.
    Types of solutionworkloads The modern data warehouse extends the scope of the data warehouse to serve Big Data that’s prepared with techniques beyond relational ETL Modern data warehousing We want to integrate all our data—including Big Data— with our data warehouse Advanced analytics We’re trying to predict when our customers churn Real-time analytics We’re trying to get insights from our devices in real-time
  • 12.
    Azure Synapse Linkfor Azure Cosmos
  • 13.
  • 14.
    Dedicated SQL ((formerly SQL DW) l) pool in Azure Synapse Analytics Azure Synapse Analytics is an analytics service that brings together enterprise data warehousing and Big Data analytics. Dedicated SQL pool (formerly SQL DW) refers to the enterprise data warehousing features that are available in Azure Synapse Analytics.
  • 15.
    Massively Parallel Processing(MPP) concepts Control Node Compute Node 0110101010101010101 1010101110101010101 Compute Node 0110101010101010101 1010101110101010101 Compute Node 0110101010101010101 1010101110101010101 Compute Node 0110101010101010101 1010101110101010101 Compute Node 0110101010101010101 1010101110101010101 Compute Node 0110101010101010101 1010101110101010101
  • 16.
    Table geometries Table distribution RoundRobin Tables Hash Distributed Tables Replicated Tables
  • 17.
    How PolyBase works TheMPP engine’s integration method with PolyBase MPP DWH Engine Control Node Compute Node DMS Compute Node DMS Azure Blob/Data Lake Name Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node
  • 18.
    Perform Azure SynapseAnalytics queries SELECT Query Basics SELECT <select_list> [FROM <optional_from_specification>] [WHERE <optional_filter_condition>] [ORDER BY <optional_sort_specification>] [JOIN <optional_join_specification>] Examples SELECT * FROM Products p WHERE p.id ="1“ SELECT p.id, p.manufacturer, p.description FROM Products p WHERE p.id ="1“ SELECT p.price, p.description, p.productId FROM Products p ORDER BY p.price ASC SELECT p.productId FROM Products p JOIN p.shipping
  • 19.
  • 20.
    Data Hub -Studio StorageAccounts Databases Dataset More…
  • 21.
    Develop Hub -Studio SQLScript Notebooks Dataflow Pipelines Gallery More…
  • 22.
  • 23.
    Q & A UsamaWahab Khan Twitter : @usamawahabkhan LinkedIn : Usamawahabkhan
  • 24.

Editor's Notes

  • #2 Introduce the team (self-introductions). Mention LearnAI – team. 3 day airlift, transition from pure databricks to AML We will use notebooks to introduce tools and techniques, and then return to one use-case We have three kinds of session: (1) presentation style, (2) demos (w/ small exercises), (3) hands-on labs. Last day is a Hackathon (w/ two use cases) Check people’s skills. Experience with Databricks, Jupyter notebooks, VS Code, Deep Learning. Who has heard of AMLCompute? Who has used it? Who has used CI/CD and git version control?
  • #5 Instructor notes Note: This is a build slide for multiple topics by design. It is important to stress to the students that this is an overview of the data platform capability, as a result multiple topics are on a single slide to aid in brevity. The important point to note with this slide are the 5 bullet points next to each technology HDIsight is not covered on this course as there are other courses that cover this. Azure Data Catalog is not on this course as it is not part of the exam objectives and it is a easy technology to learn.
  • #6 Instructor notes Note: This is a build slide for multiple topics by design. It is important to stress to the students that this is an overview of the data platform capability, as a result multiple topics are on a single slide to aid in brevity. The important point to note with this slide are the 5 bullet points next to each technology
  • #8 7
  • #12 Instructor notes This is a build slide, first of all explain the three types of Data Engineering workloads that are observed with enterprises. Click to bring the animation in, and explain that the data warehousing capabilities in Azure Synapse Analytics fit very well into the Modern Data Warehouse solution workload.
  • #16 Instructor notes If you are familiar with the underlying architecture of Azure SQL Data Warehouse, then this remains unchanged. It is also a build slide: Click 1 shows an arrow going to the Control node, representing a query that is processed by the controlled node Click 2 shows the query being broken down and being processed on the compute nodes to return the results back to the control node and back to the requesting client. More information can be found at: https://docs.microsoft.com/en-gb/azure/sql-data-warehouse/massively-parallel-processing-mpp-architecture#sql-analytics-mpp-architecture-components
  • #17 Instructor notes Read the following section on distributions for more information at https://docs.microsoft.com/en-gb/azure/sql-data-warehouse/massively-parallel-processing-mpp-architecture#distributions
  • #18 Instructor notes The slide shows a view of the MPP architecture, as a remainder of the parallelism capabilities of the MPP architecture. Focus on the process: 1. Extract the source data into text files. 2. Load the data into Azure Blob storage, Hadoop, or Azure Data Lake Store. 3. Import the data into SQL Data Warehouse staging tables using PolyBase. 4. Transform the data (optional). 5. Insert the data into production tables.
  • #19 Instructor notes Use this screenshot to go through some of the query functionality of Azure Synapse Analytics. Alternatively, you could open a tool and run through some query examples.