In the PowerPoint presentation about Azure Synapse, we begin by introducing Azure Synapse as an integrated analytics service, emphasizing its role in unifying big data and data warehousing. Key features such as unlimited information processing, querying of both relational and non-relational data, and integration with AI and BI capabilities are highlighted. The presentation delves into the architecture of Azure Synapse, illustrating how it interconnects with Azure Data Lake, Power BI, and Azure Machine Learning. We explore its robust data integration capabilities, including Azure Synapse Pipelines for efficient ETL processes. The discussion then moves to its prowess in analytics and big data processing, supporting various languages like T-SQL, Python, and Scala. The integration of Azure Synapse with AI and machine learning is underscored, showcasing its application in predictive analytics. Security features form a crucial part of the talk, emphasizing data protection and compliance aspects. Real-world use cases demonstrate Azure Synapse's practical applications in business settings. A comparative analysis with other data platforms highlights Synapse's unique benefits. The presentation concludes with guidance on getting started with Azure Synapse, followed by a summary, inviting audience questions and providing contact information for further engagement.
2. About me
• Udaiappa Ramachandran ( Udai )
• CTO/CSO-Akumina, Inc.
• Microsoft Azure MVP
• Cloud Expert
• Microsoft Azure, Amazon Web Services, and Google
• New Hampshire Cloud User Group (http://www.meetup.com/nashuaug )
• https://udai.io
3. Agenda
• Quick review on Azure Data Factory, Azure Databricks
• Azure Synapse Analytics
• Aggregating data from multiple data sources
• Exploring processed data
• Azure Synapse Security
• Demo…Demo…Demo…
4. Azure Datafactory
• Easy to use
• Wide range of connectors and features (90+)
• Powerful data integration capabilities (ingestion and transformation)
• GUI – Pipelines, data flows, power query
5. Azure Databricks
• Powerful data processing capabilities
• Machine learning and real-time analytics capabilities
• Managed service
• Notebooks
• Steeper learning curve
• Can be more expensive
7. Azure Synapse Analytics - Components
• Data Warehouse
• SQL Pool
• Dedicated
• Serverless
• Spark Pool
• Python, SQL and C#
• Big Data Engine
• Serverless Engine
• Data Flows
• Ecosystem- PowerBI+Azure Machine Learning
8. What is Azure Synapse Analytics?
Source: https://learn.microsoft.com/en-us/azure/synapse-analytics/overview-what-is
9. Azure Synapse Analytics - Capabilities
• Unified analytics platform
• Serverless and dedicated options
• Enterprise data warehouse
• Data lake exploration
• Code-free hybrid data integration
• Deeply integrated Apache Spark and SQL engines
• Cloud-native HTAP
• Choice of language (T-SQL, Python, Scala, SparkSQL, and .NET)
• Integrated AI and BI
• Data Security
10. Synapse Analytics – SQL Pools
• Serverless SQL
• Query data from ADLS Gen2 directly
• Using T-SQL to query CSV, Parquet, JSON, etc.,
• No infrastructure needed
• Stand-alone polybase service
• Pay-per query model
• No charges for metadata queries (ex., select * from sys.objects)
• When to use?
• Quick ad-hoc queries
• Logical data warehouse
• Transform data in lake
• Dedicated SQL
• Provisioned Resource: Setup infrastructure in advance
• Massively Parallel Processing (MPP) Engine
14. Data Explorer Pool
• Unified experience
• Real-time insights
• Scalability
• Security
• High performance
• Real-time ingestion
• Time series analysis
• Machine learning
15. Data Explorer Pool
Source: https://learn.microsoft.com/en-us/azure/synapse-analytics/data-explorer/data-explorer-overview
16. When to use Azure Synapse Analytics?
• Large-scale data warehousing
• Advanced analytics
• Data exploration and discovery
• Real time analytics
• Data integration
• Integrated analytics
17. Synapse Analytics Vs. Synapse Private Hub
Feature Azure Synapse Analytics Azyre Synapse Analytics Private
Hub
Access Public access over the internet Private access over a private
connection
Security Data is encrypted at rest and in
transit
Data never leaves your network
Compliance Complies with a variety of data
regulations
Can be used to comply with sticker
data privacy regulations
Use cases General-purpose data analytics Secure access to Azure synapse
Analytics from on-premises network
or another virtual network
18. Azure Synapse – Use Case
• Propose a solution for ABC company to build real-time analytics using various data
sources such as Cosmos DB, Log Analytics, and SharePoint List Items. How can we
achieve this?
19. Demo
• Create Azure Synapse
• Walkthrough Azure Synapse properties
• Create Pools
• Run Samples
• Link Cosmos DB
• Create External table
• Data Explorer --Add Table and export data / Data explorer ingest data
• PowerBI
20. Azure Synapse – Use Case
• Aggregation
• Azure Cosmos DB – Synapse Link, then external view
• Azure Log Analytics Workspace – Continuous Export then Parquet transformer using Spark and
then external table
• SharePoint Lists – Continuous export then parquet transformer using spark and then external
table
• Presentation
• PowerBI – Direct Access
• HTML controls – DW Queries
• Cost
• SQL Server – Serverless/Dedicated
• Spark Nodes
• https://azure.com/e/6233ac854ace4eddb06d15b8b056df21
23. Security on Azure Synapse
• Data at REST encryption using TDE (Transparent Data Encryption)
• In-Transit (in motion) Encryption using TLS
• Key Management
• Customer Managed
• Bring your own key (BYOK)
• Must enabled when creating Azure Synapse
• TDE Protector (key to encrypt DEK)
• Data Masking – Dynamic and Static
• Row-Level and Column-Level Security
25. Thanks for your time and trust!
New Hampshire CLOUD .NET User Group
Editor's Notes
Azure SQL Data Warehouse – a cloud-based enterprise data warehouse (EDW) that uses massively parallel processing (MPP) to reun complex queries across petabytes of data quickly.
Azure SQL Data Warehouse – a cloud-based enterprise data warehouse (EDW) that uses massively parallel processing (MPP) to reun complex queries across petabytes of data quickly.
Descriptive analytics, which answers the question “What is happening in my business?”. The data to answer this question is typically answered through the creation of a data warehouse in which historical data is persisted in relational tables for multidimensional modeling and reporting.
Diagnostic analytics, which deals with answering the question “Why is it happening?”. This may involve exploring information that already exists in a data warehouse, but typically involves a wider search of your data estate to find more data to support this type of analysis.
Predictive analytics, which enables you to answer the question “What is likely to happen in the future based on previous trends and patterns?”
Prescriptive analytics, which enables autonomous decision making based on real-time or near real-time analysis of data, using predictive analytics.
Data Warehouse: The already popular Azure Data Warehouse technology for storing and managing data for analysis and decision making, now through SQL pools.
Big Data engine: With Spark pools, engineers can now run scalable analytics with Spark languages to do Big Data processing with them .
Serverless engine: Query Data Lakes directly using SQL statements in a simple way.
Data flows: To Develop ETL flows that consume or receive data in your Data Warehouse or Data Lake with the same engine used with Azure Data Factory.
Azure Data Lake Storage+Azure SQL Data Warehouse+Azure Analytics=Azure Synapse Analytics
Data Warehouse: The already popular Azure Data Warehouse technology for storing and managing data for analysis and decision making, now through SQL pools.
Big Data engine: With Spark pools, engineers can now run scalable analytics with Spark languages to do Big Data processing with them .
Serverless engine: Query Data Lakes directly using SQL statements in a simple way.
Data flows: To Develop ETL flows that consume or receive data in your Data Warehouse or Data Lake with the same engine used with Azure Data Factory.
Azure Data Lake Storage+Azure SQL Data Warehouse+Azure Analytics=Azure Synapse Analytics
Azure SQL Data Warehouse – a cloud-based enterprise data warehouse (EDW) that uses massively parallel processing (MPP) to reun complex queries across petabytes of data quickly.
Quick ad-hoc queries – before you decide how to proceed
Logical Data warehouse- abstract layer on top of raw data
Transform data in lake-consume it directly using powerBI
The number of compute nodes ranges from 1 to 60, and is determined by the service level for Synapse SQL.
Spark notebooks- combine code, text, markdown and data visualization
YARN (Yet Another Resource Negotiator)
https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-machine-learning-mllib-notebook
Spark notebooks- combine code, text, markdown and data visualization
Azure SQL Data Warehouse – a cloud-based enterprise data warehouse (EDW) that uses massively parallel processing (MPP) to reun complex queries across petabytes of data quickly.
Double Encryption on top of Microsoft managed keys
TDE using Az Key vault--Get/Wrap/Unwrap DEK
key length 2048 or 3072
supported formats for imported key .pfx, .byok, .backup
backup your keys before using it
create a new backup when changes are made to the key
Dynamic data masking
mask data to non-privileged users
ability to specify how much is revealed
configured on specific databse fields
can be used alongside encrytion, auditing, row-level-security etc.,
can be enabled via as portal or t-sql statements
types of data masking
full xxxx
partial uxxx@xxx.com
random salary=10000;FUNCTION='random(1,8)';Masked=6
custom string ex., name=Udai; FUNCTION='partial(1,'XXXX',1);masked=UxxxxI
create user testuser without login
grant select on sales.customer to testuser
execute as user='testuser'
select....
revert
go
grant unmask to test user
revoke unmask to testuser
select c.name,tbl.name as table_name,c.is_masked,c.masking_function from sys.masked_columns as c
join sys.tables as tbl
on c.[object_it]=tbl.[object_id]
where is_masked=1
how does row level sec works
not permissin based but predicate based
security policy
security predicate is an inline table-valued function (iTVF)
filter predicate
creating rls
create table, insert rows, create users, create a schema(create schema),create security redicate(create function),create securith policy
RLS best practices
crate a separate chcema for the securit predicate function
alter any security permission is required
drop components in the following order: security policy, Table, function, schemas
avoid excessive table joins in the predicate function
CLS
control access to specific column
based on users context
grant access to -sql user and azure ad