SlideShare a Scribd company logo
Thanks to Nishant Thacker
Azure Data Platform End-to-End
Implementa ModernData PlatformArchitecture
<your name>
<your role>
<your email>
© Microsoft Corporation
Begin with the end in mind
© Microsoft Corporation
• We will understand Cloud and Big Data concepts and technologies used to solve the most common
advanced analytics problems
• We will understand the role of Microsoft Azure data services in a modern data platform architecture
• We will look at individual Azure Data Services and use them to implement a modern data platform
reference architecture
• We will have a ARM template of a data platform that will enable us to solve most of our data
challenges
Course Objectives
© Microsoft Corporation
Important Reminder
• The modern data platform architecture proposed in this course aims to help with your technology
decisions when architecting data solutions in Azure.
• The Azure services covered in this course are only a subset of a much larger family of data
services. Some real-world data scenarios may require the use of services not included in this
course.
• This course does not replace in-depth training on each Azure service covered today.
• Some concepts presented in this course can be quite complex and you may need to seek for more
information from different sources.
© Microsoft Corporation
Modern Data Platform Reference Architecture
Semi-Structured
V=Volume
csv, logs, json, xml
(loosely-typed)
Non-structured
V=Variety
images, video, audio, free text
(no structure)
Stream
V=Velocity
IoT devices, sensors, gadgets
(loosely-typed)
Load and Ingest
Store
Process
Serve
Data Factory
Event Hubs Stream Analytics
Relational Databases
(strongly-typed, structured) Azure Synapse Analytics Power BI Premium
Databricks
Azure Data Lake Gen2 CosmosDB
Power BI Premium
Application
Analytics
Analytics
λ Lambda Architecture
Hot Path
Real Time
Analytics
Cold Path
History and
Trend Analysis
Stream Datasets
and Real-time
Dashboards
Enterprise-grade
semantic model
Integrate big data
scenarios with
traditional data
warehouse
Cognitive Services
Azure ML
Business User
Fast load
data with
Polybase/
ParquetDirect
Build and Score
ML models
Scheduled / event-
triggered data ingestion
© Microsoft Corporation
Lab Guide
Azure Data Platform
Resource Group
ADPDesktop
Virtual Machine
ADPVirtualNetwork
Virtual Network
MDWResources
Storage Account
ADPLogicApp
Logic App
SynapseDataLakesuffix
Azure Data Lake Storage Gen2
ADPEventHubs-suffix
Event Hubs
SynapseStreamAnalytics-suffix
Event Hubs
SynapseDataFactory-suffix
Azure Data Factory
ADPDatabricks
Azure Databricks
ADPComputerVision
Computer Vision API
ADPCosmosDB-suffix
Azure CosmosDB
PowerBI
Power BI Desktop/
Workspace
Azure Data Platform End2End
Lab Architecture
SynapseDataFactory-suffix
Azure Data Factory
Lab 1: Load Data into Azure Synapse Analytics using Azure Data Factory Pipelines
Lab 2: Transform Big Data using Azure Data Factory Mapping Data Flows
Lab 3: Explore Big Data with Azure Databricks
Lab 4: Add AI to your Big Data pipeline with Cognitive Services
Lab 5: Ingest and Analyse Real-Time Data with Event Hubs and Stream Analytics
1
2
3
4
1
1
2
2
3
4
5 6
1
1
2
3
3
4
Student s
Computer
operationalsql-suffixNYCDataSets
Azure SQL Database
2
synapsesql-suffixSynapseDW
Azure Synapse Analytics
5
4
RDP Connection
or
Azure Bastion
© Microsoft Corporation
The modern data world out there
© Microsoft Corporation
I tried to understand it, but…
Deep Learning
ETL vs ELT
PaaS vs IaaS
Data Visualisation
Data Quality
Master Data
Big Data
Databricks
SMP vs MPP
Data Catalog
Spark
Storm
Data Mart
No-SQL
Cloud vs On-prem
Velocity, Variety and Volume
Semantic Layer
Predictive
Prescriptive
IoT
Streaming
AI
© Microsoft Corporation
The modern data estate
Data warehouses
Data lakes
Operational databases
Data warehouses
Data lakes
Operational databases
Hybrid
Security and performance
Flexibility of choice
Reason over any data, anywhere
Social
LOB Graph IoT
Image
CRM
© Microsoft Corporation
AI built-in | Most secure | Lowest TCO
Data warehouses
Data lakes
Operational databases
Data warehouses
Data lakes
Operational databases
Industry leader 4 years in a row
#1 TPC-H performance
T-SQL query over any data
70% faster
2x the global reach
99.9% SLA
Easiest lift and shift
with no code changes
The Microsoft offering
SQL Server
Hybrid
Azure Data Services
Security and performance
Flexibility of choice
Reason over any data, anywhere
Social
LOB Graph IoT
Image
CRM
© Microsoft Corporation
Valuable collection of architecture principles to help you with your technology choices
Azure Data Architecture Guide
https://aka.ms/adag
© Microsoft Corporation
Collection of reference architectures for most common challenges
Azure Architecture Solutions
https://azure.microsoft.com/en-us/solutions/architecture/
© Microsoft Corporation
Big Data and advanced analytics
Modern Data Platform Solution Scenarios
SQL
Modern data warehousing
“We want to integrate all our data—
including Big Data—with our data
warehouse”
Advanced analytics
“We’re trying to predict when our
customers churn”
Real-time analytics
“We’re trying to get insights from
our devices in real-time”
© Microsoft Corporation
Modern Data Platform Concepts
Part I
© Microsoft Corporation
IaaS vs PaaS vs SaaS
Infrastructure
as a Service (IaaS)
Storage
Servers
Networking
O/S
Middleware
Virtualization
Data
Applications
Runtime
Managed
by
Microsoft
You
scale,
make
resilient
&
manage
Platform
as a Service (PaaS)
Scale,
Resilience
and
management
by
Microsoft
You
manage
Storage
Servers
Networking
O/S
Middleware
Virtualization
Applications
Runtime
Data
On Premises
Physical / Virtual
You
scale,
make
resilient
and
manage
Storage
Servers
Networking
O/S
Middleware
Virtualization
Data
Applications
Runtime
Software
as a Service (SaaS)
Storage
Servers
Networking
O/S
Middleware
Virtualization
Applications
Runtime
Data
Scale,
Resilience
and
management
by
Microsoft
Azure
Virtual Machines
Azure
Cloud Services
© Microsoft Corporation
A data warehouse is a large collection of business data used to help an organization make decisions. Data in the Data
Warehouse has been identified as valuable to specifically defined business cases and is stored in a structured way readily
available for reporting and data analysis.
It is not an Operational Database
Different workload types: transactional (DB) versus analytics (DW)
It is not a Data Lake
These are different concepts, they can co-exist and they compliment each other
It is not a Data Mart
A data mart is a subject-oriented database populated from a subset of the Data Warehouse
What is a Data Warehouse?
© Microsoft Corporation
Modern Data Warehousing
© Microsoft Corporation
The modern data warehouse extends the scope of the data warehouse to serve Big
Data that’s prepared with techniques beyond relational ETL
Modern data warehousing
SQL
Modern data warehousing
“We want to integrate all our data—
including Big Data—with our data
warehouse”
Advanced analytics
“We’re trying to predict when our
customers churn”
Real-time analytics
“We’re trying to get insights from
our devices in real-time”
© Microsoft Corporation
Modern data warehousing pattern
Advanced Analytics
Social
LOB
Graph
IoT
Image
CRM
INGEST STORE PREP MODEL & SERVE
(& store)
Data orchestration
and monitoring
Big data store Transform & Clean Data warehouse
AI
BI + Reporting
© Microsoft Corporation
SQL Server and Azure SQL Database
Data platform continuum
Shared
lower
cost
Dedicated
higher
cost
Higher administration Lower administration
Physical
SQL Server
Physical Machine (raw iron)
IaaS
SQL Server in Azure VM
Virtualized Machines
Virtual
SQL Server Private Cloud
Virtualized Machine + Appliance
PaaS & SaaS
Azure SQL Database
Virtualized Database
SQL Server 2019 big data, analytics, and AI
Managed SQL Server, Spark,
and data lake
Store high volume data in a data lake and access
it easily using either SQL or Spark
Management services, admin portal, and
integrated security make it all easy to manage
SQL
Server
Data virtualization
Combine data from many sources without
moving or replicating it
Scale out compute and caching to boost
performance
T-SQL
Analytics Apps
Open
database
connectivity
NoSQL Relational
databases
HDFS
Complete AI platform
Easily feed integrated data from many sources to
your model training
Ingest and prep data and then train, store, and
operationalize your models all in one system
SQL Server External Tables
Compute pools and data pools
Spark
Scalable, shared storage (HDFS)
External
data sources
Admin portal and management services
Integrated AD-based security
SQL Server
ML Services
Spark &
Spark ML
HDFS
REST API containers
for models
Azure SQL Database deployment option
Azure SQL Database
Database-scoped deployment option with
predictable workload performance
Shared resource model optimized for greater
efficiency of multi-tenant applications
Best for apps that require resource
guarantee at database level
Best for SaaS apps with multiple databases that can share
resources at database level, achieving better cost efficiency
Best for modernization at scale with
low friction and effort
Elastic Pool
Single Managed Instance
Instance-scoped deployment option with high
compatibility with SQL Server and full PaaS benefits
Service
Tiers
© Microsoft Corporation
Azure Data Factory
Azure Data Factory
• Hybrid data integration service for enabling code-free ETL
• Industr
y
leading
data
ingesti
on
• Visual
• No
Code
• Hybrid • Pay
only
for
what
you
use
• Managed SSIS
Productive & trusted hybrid data integration service
that simplifies ETL with any data, from any source, at scale.
SSIS
Package
Pipeline
Azure
Integration Runtime
Command and Control
L E G E N D
Data
Orchestration @ Scale
Trigger Pipeline
Activity Activity
Activity Activity
Activity
Self-hosted
Integration Runtime
Linked
Service
© Microsoft Corporation
No-code data transformation and preparation @ scale
Azure Data Factory Data Flows
Mapping Dataflow
Code free data transformation @scale
Wrangling Dataflow
Code free data preparation @scale
© Microsoft Corporation
Azure Synapse Analytics
Integrated data platform for BI, AI and continuous intelligence
Platform
Azure
Data Lake Storage
Common Data Model
Enterprise Security
Optimized for Analytics
METASTORE
SECURITY
MANAGEMENT
MONITORING
DATA INTEGRATION
Analytics Runtimes
PROVISIONED ON-DEMAND
Form Factors
SQL
Languages
Python .NET Java Scala R
Experience Azure Synapse Analytics Studio
Artificial Intelligence / Machine Learning / Internet of Things
Intelligent Apps / Business Intelligence
Azure Synapse Analytics
© Microsoft Corporation
Azure Synapse Analytics MPP Architecture
Compute
Remote
storage
Cache TempDB
NVMe SSD
Cores Memory
Data
Log
Cache TempDB
NVMe SSD
Cores Memory
Cache TempDB
NVMe SSD
Cores Memory
Snapshot backups
© Microsoft Corporation
Table Distributions
CREATE TABLE [dbo].[FactInternetSales]
(
[ProductKey] int NOT NULL,
[OrderDateKey] int NOT NULL,
[CustomerKey] int NOT NULL,
[PromotionKey] int NOT NULL,
[SalesOrderNumber] nvarchar(20) NOT NULL,
[OrderQuantity] smallint NOT NULL,
[UnitPrice] money NOT NULL,
[SalesAmount] money NOT NULL
)
WITH
(
CLUSTERED COLUMNSTORE INDEX,
DISTRIBUTION = HASH([ProductKey]) |
ROUND ROBIN |
REPLICATED
);
Round-robin distributed
Distributes table rows evenly across all
distributions at random.
Hash distributed
Distributes table rows across the Compute
nodes by using a deterministic hash function to
assign each row to one distribution.
Replicated
Full copy of table accessible on each Compute
node.
© Microsoft Corporation
Data ingestion using external data sources
Polybase
-- Create Azure DataLake Gen2 Storage reference
CREATE EXTERNAL DATA SOURCE AzureStorage with
(
TYPE = HADOOP,
LOCATION='abfss://<container>@<storageaccnt>.blob.core.windows.net',
CREDENTIAL = AzureStorageCredential –- not required if using
managed identity
);
-- Type of format in Hadoop (CSV, RCFILE , ORC, PARQUET).
CREATE EXTERNAL FILE FORMAT TextFileFormat WITH
(
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (FIELD_TERMINATOR ='|', USE_TYPE_DEFAULT =
TRUE)
)
-- LOCATION: path to file or directory that contains data
CREATE EXTERNAL TABLE [dbo].[CarSensor_Data]
(
[SensorKey] int NOT NULL,
[Speed] float NOT NULL,
[YearMeasured] int NOT NULL
)
WITH (LOCATION='/Demo/’, DATA_SOURCE = AzureStorage,
FILE_FORMAT = TextFileFormat
);
Overview
Polybase supports querying files stored in a
Hadoop File System (HDFS), Azure Blob
storage, or Azure Data Lake Store.
To query files, users create three objects:
External data source, external file format,
external table.
© Microsoft Corporation
Lab 1: Load data into Azure Synapse
Analytics using Azure Data Factory
Pipelines
© Microsoft Corporation
Load data into Azure Synapse Analytics using Azure Data Factory Pipelines
Lab 1
Load and Ingest
Store
Process
Serve
Data Factory
Relational Databases
(strongly-typed, structured) Azure Synapse Analytics Power BI Premium
Azure Data Lake Gen2
Analytics
Enterprise-grade
semantic model
Business User
Fast load
data with
Polybase/
ParquetDirect
Scheduled / event-
triggered data ingestion
© Microsoft Corporation
Lab Architecture
Lab 1 Azure Data Platform
Resource Group
ADPDesktop
Virtual Machine
ADPVirtualNetwork
Virtual Network
MDWResources
Storage Account
SynapseDataLakesuffix
Azure Data Lake Storage Gen2
PowerBI
Power BI Desktop/
Workspace
SynapseDataFactory-suffix
Azure Data Factory
1
2
3
4
Student s
Computer
operationalsql-suffixNYCDataSets
Azure SQL Database
synapsesql-suffixSynapseDW
Azure Synapse Analytics
RDP Connection
or
Azure Bastion
© Microsoft Corporation
Modern Data Platform Concepts
Part II
The Modern Data Problem
• How to derive value from data:
• What happened historically?
• What is happening now?
• What is going to happen?
Each dimension of data is
constantly expanding
© Microsoft Corporation
It is a central storage repository that holds data coming from many sources in a raw, granular format. It can store structured,
semi-structured, or unstructured data, which means data ingested quickly and can be kept in a more flexible format for
future use cases.
What is a Data Lake?
Characteristics
• Schema-on-read
(ELT)
• Collection of data,
not a platform
• Perfect place for
evolving data
Benefits
• Quickly ingest high
volumes of diverse
data structures
• Enable advanced
analytics and data
exploration
• Scalability and
storage cost
reduction
Best
Practices
• Data Governance
needed to avoid
Data Swamp
• Security
considerations
• Design your Data
Lake
• Metadata
management
© Microsoft Corporation
Answer: both.
Data Warehouse or Data Lake?
Data Warehouse Data Lake
Requirements Relational requirements Diverse data, scalability, low cost
Data Value Data of recognised high value Candidate data of potential value
Data Processing Mostly refined calculated data Mostly detailed source data
Business Entities Known entities, tracked over time Raw material for discovering entities and facts
Data Standards Data conforms to enterprise standards Fidelity to original format and condition
Data Integration Data integration upfront Data prep on demand
Transformation Data transformed, in principle Data repurposed later, as needs arise
Schema Definition Schema-on-write Schema-on-read
Metadata Management Metadata improvement Metadata developed on read
© Microsoft Corporation
Azure Data Lake Storage Gen2
© Microsoft Corporation
High performance HDFS Endpoint to Azure Blob Storage
Azure Data Lake Storage Gen2
Hierarchical file system
Security Performance Scale and cost effectiveness
Common Blob storage foundation
Common SDK, Tools, Control Plane AAD integration, RBAC, Storage
account security
HA/DR support through ZRS and RA-
GRS
Blob API
ADLS Gen2 API
Object Tiering and Lifecycle
Policy Management
Unstructured Object Data
Server Backups, Archive Storage, Semi-
structured Data
File Data
Hadoop Filesystem, File and Folder
Hierarchy, Granular ACLs
© Microsoft Corporation
Lab 2: Transform Big Data using Azure Data
Factory Mapping Data Flows
© Microsoft Corporation
Transform Big Data using Azure Data Factory Mapping Data Flows
Lab 2
Semi-Structured
V=Volume
csv, logs, json, xml
(loosely-typed)
Load and Ingest
Store
Process
Serve
Data Factory
Relational Databases
(strongly-typed, structured) Azure Synapse Analytics Power BI Premium
Azure Data Lake Gen2
Analytics
Enterprise-grade
semantic model
Business User
Fast load
data with
Polybase/
ParquetDirect
Scheduled / event-
triggered data ingestion
© Microsoft Corporation
Lab Architecture
Lab 2
Azure Data Platform
Resource Group
ADPDesktop
Virtual Machine
ADPVirtualNetwork
Virtual Network
MDWResources
Storage Account
SynapseDataLakesuffix
Azure Data Lake Storage Gen2
SynapseDataFactory-suffix
Azure Data Factory
PowerBI
Power BI Desktop/
Workspace
SynapseDataFactory-suffix
Azure Data Factory
1
2
3
4
1 2
3
4
Student s
Computer
operationalsql-suffixNYCDataSets
Azure SQL Database
synapsesql-suffixSynapseDW
Azure Synapse Analytics
RDP Connection
or
Azure Bastion
© Microsoft Corporation
Advanced Analytics
© Microsoft Corporation
Advanced analytics goes beyond the traditional business intelligence (BI) and uses mathematical, probabilistic, and statistical
modeling techniques to enable predictive processing and automated decision making.
Advanced analytics
SQL
Modern data warehousing
“We want to integrate all our data—
including Big Data—with our data
warehouse”
Advanced analytics
“We’re trying to predict when our
customers churn”
Real-time analytics
“We’re trying to get insights from
our devices in real-time”
© Microsoft Corporation
Modern Data Platform Concepts
Part III
© Microsoft Corporation
Open Source Apache Projects for Big Data Compute
Hadoop and Spark in Azure
Azure HDInsight is a managed, full-spectrum, open-source analytics service for enterprises
© Microsoft Corporation
Azure Databricks
© Microsoft Corporation
A fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure
Azure Databricks
Best of Databricks Best of Microsoft
Designed in collaboration with the founders of Apache Spark
One-click set up; streamlined workflows
Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.
Native integration with Azure services (Power BI, SQL DW, Cosmos DB, ADLS, Azure Storage, Azure Data
Factory, Azure AD, Event Hub, IoT Hub, HDInsight Kafka, SQL DB)
Enterprise-grade Azure security (Active Directory integration, compliance, enterprise -grade SLAs)
© Microsoft Corporation
Azure Databricks
Optimized Databricks Runtime Engine
DATABRICKS I/O SERVERLESS
Collaborative Workspace
Cloud storage
Data warehouses
Hadoop storage
IoT / streaming data
Rest APIs
Machine learning models
BI tools
Data exports
Data warehouses
Azure Databricks
Enhance Productivity
Deploy Production Jobs & Workflows
APACHE SPARK
MULTI-STAGE PIPELINES
DATA ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA SCIENTIST BUSINESS ANALYST
Build on secure & trusted cloud Scale without limits
© Microsoft Corporation
Notebooks are a popular way to develop, and run, Spark Applications
Azure Databricks Notebooks
Notebooks are not only for authoring Spark applications but can be run/executed
directly on clusters
• Shift+Enter
•
•
Fine grained permissions support so they can be securely shared with colleagues
for collaboration
Notebooks are well-suited for prototyping, rapid development, exploration,
discovery and iterative development
With Azure Databricks notebooks you have a default language but you can mix multiple languages in the same notebook:
%python Allows you to execute python code in a notebook (even if that notebook is not python)
%sql Allows you to execute sql code in a notebook (even if that notebook is not sql).
%r Allows you to execute r code in a notebook (even if that notebook is not r).
%scala Allows you to execute scala code in a notebook (even if that notebook is not scala).
%sh Allows you to execute shell code in your notebook.
%fs Allows you to use Databricks Utilities - dbutils filesystem commands.
%md To include rendered markdown
© Microsoft Corporation
Lab 3: Explore Big Data with Azure
Databricks
© Microsoft Corporation
Explore Big Data with Azure Databricks
Lab 3
Semi-Structured
V=Volume
csv, logs, json, xml
(loosely-typed)
Load and Ingest
Store
Process
Serve
Data Factory
Relational Databases
(strongly-typed, structured) Azure Synapse Analytics Power BI Premium
Databricks
Azure Data Lake Gen2
Analytics
Enterprise-grade
semantic model
Integrate big data
scenarios with
traditional data
warehouse
Business User
Fast load
data with
Polybase/
ParquetDirect
Scheduled / event-
triggered data ingestion
© Microsoft Corporation
Lab Architecture
Lab 3
Azure Data Platform
Resource Group
ADPDesktop
Virtual Machine
ADPVirtualNetwork
Virtual Network
MDWResources
Storage Account
SynapseDataLakesuffix
Azure Data Lake Storage Gen2
SynapseDataFactory-suffix
Azure Data Factory
ADPDatabricks
Azure Databricks
ADPComputerVision
Computer Vision API
PowerBI
Power BI Desktop/
Workspace
SynapseDataFactory-suffix
Azure Data Factory
1
2
3
4
1 2 1
3
4
Student s
Computer
operationalsql-suffixNYCDataSets
Azure SQL Database
2
synapsesql-suffixSynapseDW
Azure Synapse Analytics
RDP Connection
or
Azure Bastion
© Microsoft Corporation
Modern Data Platform Concepts
Part IV
© Microsoft Corporation
“The ability of a digital computer or computer-controlled robot to perform tasks commonly associated with intelligent
beings.” – Encyclopedia Britannica
Artificial Intelligence
Machine Learning
Supervised Learning
Regression
Classification
Unsupervised Learning
Cluster Analysis
Application Examples
Weather Forecast
Fraud Detection
Customer Churn
Insurance Premium
© Microsoft Corporation
Term coined in 2009 for a developer meetup – ”Not Only SQL” -> “NoSQL”.
Databases that allow you to store and retrieve data in various structures, formats, and models other than tabular relational
model.
What’s No-SQL?
There’s a time and a place for everything
Sometimes a relational store is the right choice
Sometimes a NoSQL store is the right choice
Sometimes you need more than one store for an app -> polyglot
persistence
Data Structures
Key-Value Databases
Cosmos DB, Redis Cache, Azure Table
Column Family Stores
Cosmos DB, Cassandra, HBase
Graph Databases
Cosmos DB, Neo4j, Gremlin
Document Databases
Cosmos DB, MongoDB
© Microsoft Corporation
Azure AI
Azure Cognitive Services
Azure Bot Service
Azure Search Azure Databricks
Azure Machine Learning
Azure AI Infrastructure
Knowledge mining
AI apps and agents Machine learning
Azure AI
Productive Built for enterprises Trusted
Solution Areas
Language
Speech
Search
Cognitive
Services Lab
Knowledge
Vision
Custom
Vision
Speech API
Bing Spell
Check
Speaker Recognition
Bing Entity Search
Video Indexer
Emotion
Translator Text
Text to
speech
QnA Maker
Bing Search
Bing Speech
Custom Speech
Custom Decision
Translator
Speech
Computer
Vision
Content
Moderator
Bing Autosuggest
Text Analytics
Face
Language
Understanding (LUIS)
Cognitive Services APIs
Bing Statistics
add-in
Bing Visual Search
Bing Custom Search
Unified Speech
Cognitive Services capabilities
Infuse your apps, websites, and bots with human-like intelligence
© Microsoft Corporation
Cosmos DB
MongoDB
Table API
Turnkey global
distribution
Elastic scale out
of storage & throughput
Guaranteed low latency
at the 99th percentile
Comprehensive
SLAs
Five well-defined
consistency models
Azure Cosmos DB
Document
Column-family
Key-value Graph
Core (SQL) API
Resource Model
Account
Database
Database
Database
Database
Database
Container
Database
Database
Item
= Collection Graph Table
© Microsoft Corporation
Lab 4: Add AI to your Big Data Pipeline
with Cognitive Services
© Microsoft Corporation
Add AI to your Big Data Pipeline with Cognitive Services
Lab 4
Semi-Structured
V=Volume
csv, logs, json, xml
(loosely-typed)
Non-structured
V=Variety
images, video, audio, free text
(no structure)
Load and Ingest
Store
Process
Serve
Data Factory
Relational Databases
(strongly-typed, structured) Azure Synapse Analytics Power BI Premium
Databricks
Azure Data Lake Gen2 CosmosDB Application
Analytics
Enterprise-grade
semantic model
Integrate big data
scenarios with
traditional data
warehouse
Cognitive Services
Azure ML
Business User
Fast load
data with
Polybase/
ParquetDirect
Build and Score
ML models
Scheduled / event-
triggered data ingestion
© Microsoft Corporation
Lab Architecture
Lab 4
Azure Data Platform
Resource Group
ADPDesktop
Virtual Machine
ADPVirtualNetwork
Virtual Network
MDWResources
Storage Account
SynapseDataLakesuffix
Azure Data Lake Storage Gen2
SynapseDataFactory-suffix
Azure Data Factory
ADPDatabricks
Azure Databricks
ADPComputerVision
Computer Vision API
ADPCosmosDB-suffix
Azure CosmosDB
PowerBI
Power BI Desktop/
Workspace
SynapseDataFactory-suffix
Azure Data Factory
1
2
3
4
1
1
2
2
3
4
5 6
1
3
4
Student s
Computer
operationalsql-suffixNYCDataSets
Azure SQL Database
2
synapsesql-suffixSynapseDW
Azure Synapse Analytics
RDP Connection
or
Azure Bastion
© Microsoft Corporation
Real-time Analytics
© Microsoft Corporation
Deals with streams of data that are captured in real-time and processed with minimal latency to generate real-time (or near-
real-time) reports or automated responses.
Real-time analytics
SQL
Modern data warehousing
“We want to integrate all our
data—including Big Data—with
our data warehouse”
Advanced analytics
“We’re trying to predict when
our customers churn”
Real-time analytics
“We’re trying to get insights
from our devices in real-time”
© Microsoft Corporation
Modern Data Platform Concepts
Part V
© Microsoft Corporation
Streaming Use Cases
CONSUMER ENGAGEMENT
Real-time Pricing
Optimization
• Demand-Elasticity
• Personal Pricing Schemes
• Promotion events
• Multi-channel engagement
Risk and Fraud, Threat
Detection
RISK AND REVENUE
MANAGEMENT
• Real-time anomaly detection
• Card Monitoring and Fraud
Detection
• Risk Aggregation
• Preventive Maintenance
• Smart Grids and Microgrids
• Asset performance as a
Service
• UAV image analysis
Industrial IoT
GRID OPS, ASSET
OPTIMIZATION
• Real-time firewall, network, and
auth log correlation
• Anomaly detection
• Security context, enrichment
• Security Orchestration
Security Intelligence
ACTIONABLE THREAT
INTELLIGENCE
IoT DEVICE
ANALYTICS
SENSOR DATA
• Aggregation of streaming
events
• Predictive Maintenance
• Anomaly Detection
• Right product, promotion, at
right time
• Real time Ad bidding platform
• Personalized Ad Targeting
Next Best and
Personalized Offers
RECOMMENDATION ENGINE CONSUMER ENGAGEMENT
ANALYSIS
Sentiment Analysis
• Demand-Elasticity
• Social Network Analysis
• Promotion events
• Multi-channel Attribution
© Microsoft Corporation
Unlocking Real-time Insights
Time to Insight is Critical
Insights are Perishable
Ask Questions to Data in Motion
© Microsoft Corporation
Scenario Types
Actions by Human Actors
Machine to Machine Interactions
Enriched Data Movement
Automation
Dashboarding
© Microsoft Corporation
Designed to handle Big Data use cases by taking advantage of both batch and stream-processing methods
Lambda (λ) Architecture
1. All data entering the system is dispatched to both the
batch layer and the speed layer for processing.
2. The batch layer has two functions:
I. manage the master dataset (an immutable,
append-only set of raw data)
II. pre-compute the batch views.
3. The serving layer indexes the batch views so that they
can be queried in low-latency, ad-hoc way.
4. The speed layer compensates for the high latency of
updates to the serving layer and deals with recent data
only.
5. Any incoming query can be answered by merging results
from batch views and real-time
© Microsoft Corporation
Event Hubs
© Microsoft Corporation
Big data streaming platform and event ingestion service capable of receiving and processing millions of events per second.
Event Hubs
© Microsoft Corporation
Batch on stream
Event Hubs Capture
Policy based push to your own storage
Uses Avro format
Raises Event Grid events – connect to Functions, ACI, or whatever you like
Does not impact throughput
Offloads batch processing from your real-time stream
© Microsoft Corporation
Stream Analytics
© Microsoft Corporation
Event-processing engine that allows you to examine high volumes of data streaming from devices
Stream Analytics
© Microsoft Corporation
Stream Analytics Job
Users construct and deploy jobs to Azure Stream Analytics
Job definition includes inputs, a query, and output
© Microsoft Corporation
Windowing Concepts
• Operations on the data contained in temporal windows is a common pattern
• Four types of Temporal Windows:
• Sliding
• Tumbling
• Hopping
• Session
• Output at the end of each window
• Windows are fixed length
• Used in a GROUP BY clause
© Microsoft Corporation
Windowing Functions
Sliding Windows and Tumbling Windows
Sliding Windows
Tumbling Windows
© Microsoft Corporation
Windowing Functions
Hopping Windows and Session Windows
Hopping Windows
Session Windows
© Microsoft Corporation
Lab 5: Ingest and Analyse real-time data
with Event Hubs and Stream Analytics
© Microsoft Corporation
Ingest and Analyse real-time data with Event Hubs and Stream Analytics
Lab 5
Semi-Structured
V=Volume
csv, logs, json, xml
(loosely-typed)
Non-structured
V=Variety
images, video, audio, free text
(no structure)
Stream
V=Velocity
IoT devices, sensors, gadgets
(loosely-typed)
Load and Ingest
Store
Process
Serve
Data Factory
Event Hubs Stream Analytics
Relational Databases
(strongly-typed, structured) Azure Synapse Analytics Power BI Premium
Databricks
Azure Data Lake Gen2 CosmosDB
Power BI Premium
Application
Analytics
Analytics
λ Lambda Architecture
Hot Path
Real Time
Analytics
Cold Path
History and
Trend Analysis
Stream Datasets
and Real-time
Dashboards
Enterprise-grade
semantic model
Integrate big data
scenarios with
traditional data
warehouse
Cognitive Services
Azure ML
Business User
Fast load
data with
Polybase/
ParquetDirect
Build and Score
ML models
Scheduled / event-
triggered data ingestion
© Microsoft Corporation
Lab Architecture
Lab 5
Azure Data Platform
Resource Group
ADPDesktop
Virtual Machine
ADPVirtualNetwork
Virtual Network
MDWResources
Storage Account
ADPLogicApp
Logic App
SynapseDataLakesuffix
Azure Data Lake Storage Gen2
ADPEventHubs-suffix
Event Hubs
SynapseStreamAnalytics-suffix
Event Hubs
SynapseDataFactory-suffix
Azure Data Factory
ADPDatabricks
Azure Databricks
ADPComputerVision
Computer Vision API
ADPCosmosDB-suffix
Azure CosmosDB
PowerBI
Power BI Desktop/
Workspace
SynapseDataFactory-suffix
Azure Data Factory
1
2
3
4
1
1
2
2
3
4
5 6
1
1
2
3
3
4
Student s
Computer
operationalsql-suffixNYCDataSets
Azure SQL Database
2
synapsesql-suffixSynapseDW
Azure Synapse Analytics
5
4
RDP Connection
or
Azure Bastion
© Microsoft Corporation
It’s all on
© Copyright Microsoft Corporation. All rights reserved.

More Related Content

What's hot

data-mesh-101.pptx
data-mesh-101.pptxdata-mesh-101.pptx
data-mesh-101.pptx
TarekHamdi8
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Azure Data Engineering.pptx
Azure Data Engineering.pptxAzure Data Engineering.pptx
Azure Data Engineering.pptx
priyadharshini626440
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
Data Con LA
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
James Serra
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Power BI Governance - Access Management, Recommendations and Best Practices
Power BI Governance - Access Management, Recommendations and Best PracticesPower BI Governance - Access Management, Recommendations and Best Practices
Power BI Governance - Access Management, Recommendations and Best Practices
Learning SharePoint
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
Dalibor Wijas
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
James Serra
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
Antonios Chatzipavlis
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
Matthew W. Bowers
 
Lakehouse in Azure
Lakehouse in AzureLakehouse in Azure
Lakehouse in Azure
Sergio Zenatti Filho
 
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
Denodo
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
DataScienceConferenc1
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Cambridge Semantics
 
Data Architecture Brief Overview
Data Architecture Brief OverviewData Architecture Brief Overview
Data Architecture Brief Overview
Hal Kalechofsky
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Cathrine Wilhelmsen
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
James Serra
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Implement SQL Server on an Azure VM
Implement SQL Server on an Azure VMImplement SQL Server on an Azure VM
Implement SQL Server on an Azure VM
James Serra
 

What's hot (20)

data-mesh-101.pptx
data-mesh-101.pptxdata-mesh-101.pptx
data-mesh-101.pptx
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Azure Data Engineering.pptx
Azure Data Engineering.pptxAzure Data Engineering.pptx
Azure Data Engineering.pptx
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Power BI Governance - Access Management, Recommendations and Best Practices
Power BI Governance - Access Management, Recommendations and Best PracticesPower BI Governance - Access Management, Recommendations and Best Practices
Power BI Governance - Access Management, Recommendations and Best Practices
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
 
Lakehouse in Azure
Lakehouse in AzureLakehouse in Azure
Lakehouse in Azure
 
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
Data Architecture Brief Overview
Data Architecture Brief OverviewData Architecture Brief Overview
Data Architecture Brief Overview
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Implement SQL Server on an Azure VM
Implement SQL Server on an Azure VMImplement SQL Server on an Azure VM
Implement SQL Server on an Azure VM
 

Similar to Azure Data.pptx

Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloud
James Serra
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Trivadis
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
Caserta
 
Data Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptxData Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptx
ArunPandiyan890855
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
James Serra
 
SQL Azure the database in the cloud
SQL Azure the database in the cloud SQL Azure the database in the cloud
SQL Azure the database in the cloud
Eduardo Castro
 
Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BICepta The Future of Data with Power BI
Cepta The Future of Data with Power BI
Kellyn Pot'Vin-Gorman
 
Azure fundamental -Introduction
Azure fundamental -IntroductionAzure fundamental -Introduction
Azure fundamental -Introduction
ManishK55
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the Cloud
Mark Kromer
 
Azure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyAzure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandy
Nilesh Shah
 
Sky High With Azure
Sky High With AzureSky High With Azure
Sky High With Azure
Clint Edmonson
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
James Serra
 
2014.11.14 Data Opportunities with Azure
2014.11.14 Data Opportunities with Azure2014.11.14 Data Opportunities with Azure
2014.11.14 Data Opportunities with Azure
Marco Parenzan
 
Exploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresExploring Microsoft Azure Infrastructures
Exploring Microsoft Azure Infrastructures
CCG
 
Modern Business Intelligence and Advanced Analytics
Modern Business Intelligence and Advanced AnalyticsModern Business Intelligence and Advanced Analytics
Modern Business Intelligence and Advanced Analytics
Collective Intelligence Inc.
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
James Serra
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
Trivadis
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure
Abhimanyu Singhal
 
Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
James Serra
 
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
SQL Saturday Redmond 2019 ETL Patterns in the CloudSQL Saturday Redmond 2019 ETL Patterns in the Cloud
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
Mark Kromer
 

Similar to Azure Data.pptx (20)

Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloud
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Data Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptxData Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptx
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
SQL Azure the database in the cloud
SQL Azure the database in the cloud SQL Azure the database in the cloud
SQL Azure the database in the cloud
 
Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BICepta The Future of Data with Power BI
Cepta The Future of Data with Power BI
 
Azure fundamental -Introduction
Azure fundamental -IntroductionAzure fundamental -Introduction
Azure fundamental -Introduction
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the Cloud
 
Azure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyAzure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandy
 
Sky High With Azure
Sky High With AzureSky High With Azure
Sky High With Azure
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
 
2014.11.14 Data Opportunities with Azure
2014.11.14 Data Opportunities with Azure2014.11.14 Data Opportunities with Azure
2014.11.14 Data Opportunities with Azure
 
Exploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresExploring Microsoft Azure Infrastructures
Exploring Microsoft Azure Infrastructures
 
Modern Business Intelligence and Advanced Analytics
Modern Business Intelligence and Advanced AnalyticsModern Business Intelligence and Advanced Analytics
Modern Business Intelligence and Advanced Analytics
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure
 
Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
 
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
SQL Saturday Redmond 2019 ETL Patterns in the CloudSQL Saturday Redmond 2019 ETL Patterns in the Cloud
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
 

Recently uploaded

哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 

Recently uploaded (20)

哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 

Azure Data.pptx

  • 2. Azure Data Platform End-to-End Implementa ModernData PlatformArchitecture <your name> <your role> <your email>
  • 3. © Microsoft Corporation Begin with the end in mind
  • 4. © Microsoft Corporation • We will understand Cloud and Big Data concepts and technologies used to solve the most common advanced analytics problems • We will understand the role of Microsoft Azure data services in a modern data platform architecture • We will look at individual Azure Data Services and use them to implement a modern data platform reference architecture • We will have a ARM template of a data platform that will enable us to solve most of our data challenges Course Objectives
  • 5. © Microsoft Corporation Important Reminder • The modern data platform architecture proposed in this course aims to help with your technology decisions when architecting data solutions in Azure. • The Azure services covered in this course are only a subset of a much larger family of data services. Some real-world data scenarios may require the use of services not included in this course. • This course does not replace in-depth training on each Azure service covered today. • Some concepts presented in this course can be quite complex and you may need to seek for more information from different sources.
  • 6. © Microsoft Corporation Modern Data Platform Reference Architecture Semi-Structured V=Volume csv, logs, json, xml (loosely-typed) Non-structured V=Variety images, video, audio, free text (no structure) Stream V=Velocity IoT devices, sensors, gadgets (loosely-typed) Load and Ingest Store Process Serve Data Factory Event Hubs Stream Analytics Relational Databases (strongly-typed, structured) Azure Synapse Analytics Power BI Premium Databricks Azure Data Lake Gen2 CosmosDB Power BI Premium Application Analytics Analytics λ Lambda Architecture Hot Path Real Time Analytics Cold Path History and Trend Analysis Stream Datasets and Real-time Dashboards Enterprise-grade semantic model Integrate big data scenarios with traditional data warehouse Cognitive Services Azure ML Business User Fast load data with Polybase/ ParquetDirect Build and Score ML models Scheduled / event- triggered data ingestion
  • 7. © Microsoft Corporation Lab Guide Azure Data Platform Resource Group ADPDesktop Virtual Machine ADPVirtualNetwork Virtual Network MDWResources Storage Account ADPLogicApp Logic App SynapseDataLakesuffix Azure Data Lake Storage Gen2 ADPEventHubs-suffix Event Hubs SynapseStreamAnalytics-suffix Event Hubs SynapseDataFactory-suffix Azure Data Factory ADPDatabricks Azure Databricks ADPComputerVision Computer Vision API ADPCosmosDB-suffix Azure CosmosDB PowerBI Power BI Desktop/ Workspace Azure Data Platform End2End Lab Architecture SynapseDataFactory-suffix Azure Data Factory Lab 1: Load Data into Azure Synapse Analytics using Azure Data Factory Pipelines Lab 2: Transform Big Data using Azure Data Factory Mapping Data Flows Lab 3: Explore Big Data with Azure Databricks Lab 4: Add AI to your Big Data pipeline with Cognitive Services Lab 5: Ingest and Analyse Real-Time Data with Event Hubs and Stream Analytics 1 2 3 4 1 1 2 2 3 4 5 6 1 1 2 3 3 4 Student s Computer operationalsql-suffixNYCDataSets Azure SQL Database 2 synapsesql-suffixSynapseDW Azure Synapse Analytics 5 4 RDP Connection or Azure Bastion
  • 8. © Microsoft Corporation The modern data world out there
  • 9. © Microsoft Corporation I tried to understand it, but… Deep Learning ETL vs ELT PaaS vs IaaS Data Visualisation Data Quality Master Data Big Data Databricks SMP vs MPP Data Catalog Spark Storm Data Mart No-SQL Cloud vs On-prem Velocity, Variety and Volume Semantic Layer Predictive Prescriptive IoT Streaming AI
  • 10. © Microsoft Corporation The modern data estate Data warehouses Data lakes Operational databases Data warehouses Data lakes Operational databases Hybrid Security and performance Flexibility of choice Reason over any data, anywhere Social LOB Graph IoT Image CRM
  • 11. © Microsoft Corporation AI built-in | Most secure | Lowest TCO Data warehouses Data lakes Operational databases Data warehouses Data lakes Operational databases Industry leader 4 years in a row #1 TPC-H performance T-SQL query over any data 70% faster 2x the global reach 99.9% SLA Easiest lift and shift with no code changes The Microsoft offering SQL Server Hybrid Azure Data Services Security and performance Flexibility of choice Reason over any data, anywhere Social LOB Graph IoT Image CRM
  • 12. © Microsoft Corporation Valuable collection of architecture principles to help you with your technology choices Azure Data Architecture Guide https://aka.ms/adag
  • 13. © Microsoft Corporation Collection of reference architectures for most common challenges Azure Architecture Solutions https://azure.microsoft.com/en-us/solutions/architecture/
  • 14. © Microsoft Corporation Big Data and advanced analytics Modern Data Platform Solution Scenarios SQL Modern data warehousing “We want to integrate all our data— including Big Data—with our data warehouse” Advanced analytics “We’re trying to predict when our customers churn” Real-time analytics “We’re trying to get insights from our devices in real-time”
  • 15. © Microsoft Corporation Modern Data Platform Concepts Part I
  • 16. © Microsoft Corporation IaaS vs PaaS vs SaaS Infrastructure as a Service (IaaS) Storage Servers Networking O/S Middleware Virtualization Data Applications Runtime Managed by Microsoft You scale, make resilient & manage Platform as a Service (PaaS) Scale, Resilience and management by Microsoft You manage Storage Servers Networking O/S Middleware Virtualization Applications Runtime Data On Premises Physical / Virtual You scale, make resilient and manage Storage Servers Networking O/S Middleware Virtualization Data Applications Runtime Software as a Service (SaaS) Storage Servers Networking O/S Middleware Virtualization Applications Runtime Data Scale, Resilience and management by Microsoft Azure Virtual Machines Azure Cloud Services
  • 17. © Microsoft Corporation A data warehouse is a large collection of business data used to help an organization make decisions. Data in the Data Warehouse has been identified as valuable to specifically defined business cases and is stored in a structured way readily available for reporting and data analysis. It is not an Operational Database Different workload types: transactional (DB) versus analytics (DW) It is not a Data Lake These are different concepts, they can co-exist and they compliment each other It is not a Data Mart A data mart is a subject-oriented database populated from a subset of the Data Warehouse What is a Data Warehouse?
  • 19. © Microsoft Corporation The modern data warehouse extends the scope of the data warehouse to serve Big Data that’s prepared with techniques beyond relational ETL Modern data warehousing SQL Modern data warehousing “We want to integrate all our data— including Big Data—with our data warehouse” Advanced analytics “We’re trying to predict when our customers churn” Real-time analytics “We’re trying to get insights from our devices in real-time”
  • 20. © Microsoft Corporation Modern data warehousing pattern Advanced Analytics Social LOB Graph IoT Image CRM INGEST STORE PREP MODEL & SERVE (& store) Data orchestration and monitoring Big data store Transform & Clean Data warehouse AI BI + Reporting
  • 21. © Microsoft Corporation SQL Server and Azure SQL Database
  • 22. Data platform continuum Shared lower cost Dedicated higher cost Higher administration Lower administration Physical SQL Server Physical Machine (raw iron) IaaS SQL Server in Azure VM Virtualized Machines Virtual SQL Server Private Cloud Virtualized Machine + Appliance PaaS & SaaS Azure SQL Database Virtualized Database
  • 23. SQL Server 2019 big data, analytics, and AI Managed SQL Server, Spark, and data lake Store high volume data in a data lake and access it easily using either SQL or Spark Management services, admin portal, and integrated security make it all easy to manage SQL Server Data virtualization Combine data from many sources without moving or replicating it Scale out compute and caching to boost performance T-SQL Analytics Apps Open database connectivity NoSQL Relational databases HDFS Complete AI platform Easily feed integrated data from many sources to your model training Ingest and prep data and then train, store, and operationalize your models all in one system SQL Server External Tables Compute pools and data pools Spark Scalable, shared storage (HDFS) External data sources Admin portal and management services Integrated AD-based security SQL Server ML Services Spark & Spark ML HDFS REST API containers for models
  • 24. Azure SQL Database deployment option Azure SQL Database Database-scoped deployment option with predictable workload performance Shared resource model optimized for greater efficiency of multi-tenant applications Best for apps that require resource guarantee at database level Best for SaaS apps with multiple databases that can share resources at database level, achieving better cost efficiency Best for modernization at scale with low friction and effort Elastic Pool Single Managed Instance Instance-scoped deployment option with high compatibility with SQL Server and full PaaS benefits Service Tiers
  • 26. Azure Data Factory • Hybrid data integration service for enabling code-free ETL • Industr y leading data ingesti on • Visual • No Code • Hybrid • Pay only for what you use • Managed SSIS Productive & trusted hybrid data integration service that simplifies ETL with any data, from any source, at scale.
  • 28. Azure Integration Runtime Command and Control L E G E N D Data Orchestration @ Scale Trigger Pipeline Activity Activity Activity Activity Activity Self-hosted Integration Runtime Linked Service
  • 29. © Microsoft Corporation No-code data transformation and preparation @ scale Azure Data Factory Data Flows Mapping Dataflow Code free data transformation @scale Wrangling Dataflow Code free data preparation @scale
  • 30. © Microsoft Corporation Azure Synapse Analytics
  • 31. Integrated data platform for BI, AI and continuous intelligence Platform Azure Data Lake Storage Common Data Model Enterprise Security Optimized for Analytics METASTORE SECURITY MANAGEMENT MONITORING DATA INTEGRATION Analytics Runtimes PROVISIONED ON-DEMAND Form Factors SQL Languages Python .NET Java Scala R Experience Azure Synapse Analytics Studio Artificial Intelligence / Machine Learning / Internet of Things Intelligent Apps / Business Intelligence Azure Synapse Analytics
  • 32. © Microsoft Corporation Azure Synapse Analytics MPP Architecture Compute Remote storage Cache TempDB NVMe SSD Cores Memory Data Log Cache TempDB NVMe SSD Cores Memory Cache TempDB NVMe SSD Cores Memory Snapshot backups
  • 33. © Microsoft Corporation Table Distributions CREATE TABLE [dbo].[FactInternetSales] ( [ProductKey] int NOT NULL, [OrderDateKey] int NOT NULL, [CustomerKey] int NOT NULL, [PromotionKey] int NOT NULL, [SalesOrderNumber] nvarchar(20) NOT NULL, [OrderQuantity] smallint NOT NULL, [UnitPrice] money NOT NULL, [SalesAmount] money NOT NULL ) WITH ( CLUSTERED COLUMNSTORE INDEX, DISTRIBUTION = HASH([ProductKey]) | ROUND ROBIN | REPLICATED ); Round-robin distributed Distributes table rows evenly across all distributions at random. Hash distributed Distributes table rows across the Compute nodes by using a deterministic hash function to assign each row to one distribution. Replicated Full copy of table accessible on each Compute node.
  • 34. © Microsoft Corporation Data ingestion using external data sources Polybase -- Create Azure DataLake Gen2 Storage reference CREATE EXTERNAL DATA SOURCE AzureStorage with ( TYPE = HADOOP, LOCATION='abfss://<container>@<storageaccnt>.blob.core.windows.net', CREDENTIAL = AzureStorageCredential –- not required if using managed identity ); -- Type of format in Hadoop (CSV, RCFILE , ORC, PARQUET). CREATE EXTERNAL FILE FORMAT TextFileFormat WITH ( FORMAT_TYPE = DELIMITEDTEXT, FORMAT_OPTIONS (FIELD_TERMINATOR ='|', USE_TYPE_DEFAULT = TRUE) ) -- LOCATION: path to file or directory that contains data CREATE EXTERNAL TABLE [dbo].[CarSensor_Data] ( [SensorKey] int NOT NULL, [Speed] float NOT NULL, [YearMeasured] int NOT NULL ) WITH (LOCATION='/Demo/’, DATA_SOURCE = AzureStorage, FILE_FORMAT = TextFileFormat ); Overview Polybase supports querying files stored in a Hadoop File System (HDFS), Azure Blob storage, or Azure Data Lake Store. To query files, users create three objects: External data source, external file format, external table.
  • 35. © Microsoft Corporation Lab 1: Load data into Azure Synapse Analytics using Azure Data Factory Pipelines
  • 36. © Microsoft Corporation Load data into Azure Synapse Analytics using Azure Data Factory Pipelines Lab 1 Load and Ingest Store Process Serve Data Factory Relational Databases (strongly-typed, structured) Azure Synapse Analytics Power BI Premium Azure Data Lake Gen2 Analytics Enterprise-grade semantic model Business User Fast load data with Polybase/ ParquetDirect Scheduled / event- triggered data ingestion
  • 37. © Microsoft Corporation Lab Architecture Lab 1 Azure Data Platform Resource Group ADPDesktop Virtual Machine ADPVirtualNetwork Virtual Network MDWResources Storage Account SynapseDataLakesuffix Azure Data Lake Storage Gen2 PowerBI Power BI Desktop/ Workspace SynapseDataFactory-suffix Azure Data Factory 1 2 3 4 Student s Computer operationalsql-suffixNYCDataSets Azure SQL Database synapsesql-suffixSynapseDW Azure Synapse Analytics RDP Connection or Azure Bastion
  • 38. © Microsoft Corporation Modern Data Platform Concepts Part II
  • 39. The Modern Data Problem • How to derive value from data: • What happened historically? • What is happening now? • What is going to happen? Each dimension of data is constantly expanding
  • 40. © Microsoft Corporation It is a central storage repository that holds data coming from many sources in a raw, granular format. It can store structured, semi-structured, or unstructured data, which means data ingested quickly and can be kept in a more flexible format for future use cases. What is a Data Lake? Characteristics • Schema-on-read (ELT) • Collection of data, not a platform • Perfect place for evolving data Benefits • Quickly ingest high volumes of diverse data structures • Enable advanced analytics and data exploration • Scalability and storage cost reduction Best Practices • Data Governance needed to avoid Data Swamp • Security considerations • Design your Data Lake • Metadata management
  • 41. © Microsoft Corporation Answer: both. Data Warehouse or Data Lake? Data Warehouse Data Lake Requirements Relational requirements Diverse data, scalability, low cost Data Value Data of recognised high value Candidate data of potential value Data Processing Mostly refined calculated data Mostly detailed source data Business Entities Known entities, tracked over time Raw material for discovering entities and facts Data Standards Data conforms to enterprise standards Fidelity to original format and condition Data Integration Data integration upfront Data prep on demand Transformation Data transformed, in principle Data repurposed later, as needs arise Schema Definition Schema-on-write Schema-on-read Metadata Management Metadata improvement Metadata developed on read
  • 42. © Microsoft Corporation Azure Data Lake Storage Gen2
  • 43. © Microsoft Corporation High performance HDFS Endpoint to Azure Blob Storage Azure Data Lake Storage Gen2 Hierarchical file system Security Performance Scale and cost effectiveness Common Blob storage foundation Common SDK, Tools, Control Plane AAD integration, RBAC, Storage account security HA/DR support through ZRS and RA- GRS Blob API ADLS Gen2 API Object Tiering and Lifecycle Policy Management Unstructured Object Data Server Backups, Archive Storage, Semi- structured Data File Data Hadoop Filesystem, File and Folder Hierarchy, Granular ACLs
  • 44. © Microsoft Corporation Lab 2: Transform Big Data using Azure Data Factory Mapping Data Flows
  • 45. © Microsoft Corporation Transform Big Data using Azure Data Factory Mapping Data Flows Lab 2 Semi-Structured V=Volume csv, logs, json, xml (loosely-typed) Load and Ingest Store Process Serve Data Factory Relational Databases (strongly-typed, structured) Azure Synapse Analytics Power BI Premium Azure Data Lake Gen2 Analytics Enterprise-grade semantic model Business User Fast load data with Polybase/ ParquetDirect Scheduled / event- triggered data ingestion
  • 46. © Microsoft Corporation Lab Architecture Lab 2 Azure Data Platform Resource Group ADPDesktop Virtual Machine ADPVirtualNetwork Virtual Network MDWResources Storage Account SynapseDataLakesuffix Azure Data Lake Storage Gen2 SynapseDataFactory-suffix Azure Data Factory PowerBI Power BI Desktop/ Workspace SynapseDataFactory-suffix Azure Data Factory 1 2 3 4 1 2 3 4 Student s Computer operationalsql-suffixNYCDataSets Azure SQL Database synapsesql-suffixSynapseDW Azure Synapse Analytics RDP Connection or Azure Bastion
  • 48. © Microsoft Corporation Advanced analytics goes beyond the traditional business intelligence (BI) and uses mathematical, probabilistic, and statistical modeling techniques to enable predictive processing and automated decision making. Advanced analytics SQL Modern data warehousing “We want to integrate all our data— including Big Data—with our data warehouse” Advanced analytics “We’re trying to predict when our customers churn” Real-time analytics “We’re trying to get insights from our devices in real-time”
  • 49. © Microsoft Corporation Modern Data Platform Concepts Part III
  • 50. © Microsoft Corporation Open Source Apache Projects for Big Data Compute Hadoop and Spark in Azure Azure HDInsight is a managed, full-spectrum, open-source analytics service for enterprises
  • 52. © Microsoft Corporation A fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure Azure Databricks Best of Databricks Best of Microsoft Designed in collaboration with the founders of Apache Spark One-click set up; streamlined workflows Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts. Native integration with Azure services (Power BI, SQL DW, Cosmos DB, ADLS, Azure Storage, Azure Data Factory, Azure AD, Event Hub, IoT Hub, HDInsight Kafka, SQL DB) Enterprise-grade Azure security (Active Directory integration, compliance, enterprise -grade SLAs)
  • 53. © Microsoft Corporation Azure Databricks Optimized Databricks Runtime Engine DATABRICKS I/O SERVERLESS Collaborative Workspace Cloud storage Data warehouses Hadoop storage IoT / streaming data Rest APIs Machine learning models BI tools Data exports Data warehouses Azure Databricks Enhance Productivity Deploy Production Jobs & Workflows APACHE SPARK MULTI-STAGE PIPELINES DATA ENGINEER JOB SCHEDULER NOTIFICATION & LOGS DATA SCIENTIST BUSINESS ANALYST Build on secure & trusted cloud Scale without limits
  • 54. © Microsoft Corporation Notebooks are a popular way to develop, and run, Spark Applications Azure Databricks Notebooks Notebooks are not only for authoring Spark applications but can be run/executed directly on clusters • Shift+Enter • • Fine grained permissions support so they can be securely shared with colleagues for collaboration Notebooks are well-suited for prototyping, rapid development, exploration, discovery and iterative development With Azure Databricks notebooks you have a default language but you can mix multiple languages in the same notebook: %python Allows you to execute python code in a notebook (even if that notebook is not python) %sql Allows you to execute sql code in a notebook (even if that notebook is not sql). %r Allows you to execute r code in a notebook (even if that notebook is not r). %scala Allows you to execute scala code in a notebook (even if that notebook is not scala). %sh Allows you to execute shell code in your notebook. %fs Allows you to use Databricks Utilities - dbutils filesystem commands. %md To include rendered markdown
  • 55. © Microsoft Corporation Lab 3: Explore Big Data with Azure Databricks
  • 56. © Microsoft Corporation Explore Big Data with Azure Databricks Lab 3 Semi-Structured V=Volume csv, logs, json, xml (loosely-typed) Load and Ingest Store Process Serve Data Factory Relational Databases (strongly-typed, structured) Azure Synapse Analytics Power BI Premium Databricks Azure Data Lake Gen2 Analytics Enterprise-grade semantic model Integrate big data scenarios with traditional data warehouse Business User Fast load data with Polybase/ ParquetDirect Scheduled / event- triggered data ingestion
  • 57. © Microsoft Corporation Lab Architecture Lab 3 Azure Data Platform Resource Group ADPDesktop Virtual Machine ADPVirtualNetwork Virtual Network MDWResources Storage Account SynapseDataLakesuffix Azure Data Lake Storage Gen2 SynapseDataFactory-suffix Azure Data Factory ADPDatabricks Azure Databricks ADPComputerVision Computer Vision API PowerBI Power BI Desktop/ Workspace SynapseDataFactory-suffix Azure Data Factory 1 2 3 4 1 2 1 3 4 Student s Computer operationalsql-suffixNYCDataSets Azure SQL Database 2 synapsesql-suffixSynapseDW Azure Synapse Analytics RDP Connection or Azure Bastion
  • 58. © Microsoft Corporation Modern Data Platform Concepts Part IV
  • 59. © Microsoft Corporation “The ability of a digital computer or computer-controlled robot to perform tasks commonly associated with intelligent beings.” – Encyclopedia Britannica Artificial Intelligence Machine Learning Supervised Learning Regression Classification Unsupervised Learning Cluster Analysis Application Examples Weather Forecast Fraud Detection Customer Churn Insurance Premium
  • 60. © Microsoft Corporation Term coined in 2009 for a developer meetup – ”Not Only SQL” -> “NoSQL”. Databases that allow you to store and retrieve data in various structures, formats, and models other than tabular relational model. What’s No-SQL? There’s a time and a place for everything Sometimes a relational store is the right choice Sometimes a NoSQL store is the right choice Sometimes you need more than one store for an app -> polyglot persistence Data Structures Key-Value Databases Cosmos DB, Redis Cache, Azure Table Column Family Stores Cosmos DB, Cassandra, HBase Graph Databases Cosmos DB, Neo4j, Gremlin Document Databases Cosmos DB, MongoDB
  • 62. Azure Cognitive Services Azure Bot Service Azure Search Azure Databricks Azure Machine Learning Azure AI Infrastructure Knowledge mining AI apps and agents Machine learning Azure AI Productive Built for enterprises Trusted Solution Areas
  • 63. Language Speech Search Cognitive Services Lab Knowledge Vision Custom Vision Speech API Bing Spell Check Speaker Recognition Bing Entity Search Video Indexer Emotion Translator Text Text to speech QnA Maker Bing Search Bing Speech Custom Speech Custom Decision Translator Speech Computer Vision Content Moderator Bing Autosuggest Text Analytics Face Language Understanding (LUIS) Cognitive Services APIs Bing Statistics add-in Bing Visual Search Bing Custom Search Unified Speech
  • 64. Cognitive Services capabilities Infuse your apps, websites, and bots with human-like intelligence
  • 66. MongoDB Table API Turnkey global distribution Elastic scale out of storage & throughput Guaranteed low latency at the 99th percentile Comprehensive SLAs Five well-defined consistency models Azure Cosmos DB Document Column-family Key-value Graph Core (SQL) API
  • 68. © Microsoft Corporation Lab 4: Add AI to your Big Data Pipeline with Cognitive Services
  • 69. © Microsoft Corporation Add AI to your Big Data Pipeline with Cognitive Services Lab 4 Semi-Structured V=Volume csv, logs, json, xml (loosely-typed) Non-structured V=Variety images, video, audio, free text (no structure) Load and Ingest Store Process Serve Data Factory Relational Databases (strongly-typed, structured) Azure Synapse Analytics Power BI Premium Databricks Azure Data Lake Gen2 CosmosDB Application Analytics Enterprise-grade semantic model Integrate big data scenarios with traditional data warehouse Cognitive Services Azure ML Business User Fast load data with Polybase/ ParquetDirect Build and Score ML models Scheduled / event- triggered data ingestion
  • 70. © Microsoft Corporation Lab Architecture Lab 4 Azure Data Platform Resource Group ADPDesktop Virtual Machine ADPVirtualNetwork Virtual Network MDWResources Storage Account SynapseDataLakesuffix Azure Data Lake Storage Gen2 SynapseDataFactory-suffix Azure Data Factory ADPDatabricks Azure Databricks ADPComputerVision Computer Vision API ADPCosmosDB-suffix Azure CosmosDB PowerBI Power BI Desktop/ Workspace SynapseDataFactory-suffix Azure Data Factory 1 2 3 4 1 1 2 2 3 4 5 6 1 3 4 Student s Computer operationalsql-suffixNYCDataSets Azure SQL Database 2 synapsesql-suffixSynapseDW Azure Synapse Analytics RDP Connection or Azure Bastion
  • 72. © Microsoft Corporation Deals with streams of data that are captured in real-time and processed with minimal latency to generate real-time (or near- real-time) reports or automated responses. Real-time analytics SQL Modern data warehousing “We want to integrate all our data—including Big Data—with our data warehouse” Advanced analytics “We’re trying to predict when our customers churn” Real-time analytics “We’re trying to get insights from our devices in real-time”
  • 73. © Microsoft Corporation Modern Data Platform Concepts Part V
  • 74. © Microsoft Corporation Streaming Use Cases CONSUMER ENGAGEMENT Real-time Pricing Optimization • Demand-Elasticity • Personal Pricing Schemes • Promotion events • Multi-channel engagement Risk and Fraud, Threat Detection RISK AND REVENUE MANAGEMENT • Real-time anomaly detection • Card Monitoring and Fraud Detection • Risk Aggregation • Preventive Maintenance • Smart Grids and Microgrids • Asset performance as a Service • UAV image analysis Industrial IoT GRID OPS, ASSET OPTIMIZATION • Real-time firewall, network, and auth log correlation • Anomaly detection • Security context, enrichment • Security Orchestration Security Intelligence ACTIONABLE THREAT INTELLIGENCE IoT DEVICE ANALYTICS SENSOR DATA • Aggregation of streaming events • Predictive Maintenance • Anomaly Detection • Right product, promotion, at right time • Real time Ad bidding platform • Personalized Ad Targeting Next Best and Personalized Offers RECOMMENDATION ENGINE CONSUMER ENGAGEMENT ANALYSIS Sentiment Analysis • Demand-Elasticity • Social Network Analysis • Promotion events • Multi-channel Attribution
  • 75. © Microsoft Corporation Unlocking Real-time Insights Time to Insight is Critical Insights are Perishable Ask Questions to Data in Motion
  • 76. © Microsoft Corporation Scenario Types Actions by Human Actors Machine to Machine Interactions Enriched Data Movement Automation Dashboarding
  • 77. © Microsoft Corporation Designed to handle Big Data use cases by taking advantage of both batch and stream-processing methods Lambda (λ) Architecture 1. All data entering the system is dispatched to both the batch layer and the speed layer for processing. 2. The batch layer has two functions: I. manage the master dataset (an immutable, append-only set of raw data) II. pre-compute the batch views. 3. The serving layer indexes the batch views so that they can be queried in low-latency, ad-hoc way. 4. The speed layer compensates for the high latency of updates to the serving layer and deals with recent data only. 5. Any incoming query can be answered by merging results from batch views and real-time
  • 79. © Microsoft Corporation Big data streaming platform and event ingestion service capable of receiving and processing millions of events per second. Event Hubs
  • 80. © Microsoft Corporation Batch on stream Event Hubs Capture Policy based push to your own storage Uses Avro format Raises Event Grid events – connect to Functions, ACI, or whatever you like Does not impact throughput Offloads batch processing from your real-time stream
  • 82. © Microsoft Corporation Event-processing engine that allows you to examine high volumes of data streaming from devices Stream Analytics
  • 83. © Microsoft Corporation Stream Analytics Job Users construct and deploy jobs to Azure Stream Analytics Job definition includes inputs, a query, and output
  • 84. © Microsoft Corporation Windowing Concepts • Operations on the data contained in temporal windows is a common pattern • Four types of Temporal Windows: • Sliding • Tumbling • Hopping • Session • Output at the end of each window • Windows are fixed length • Used in a GROUP BY clause
  • 85. © Microsoft Corporation Windowing Functions Sliding Windows and Tumbling Windows Sliding Windows Tumbling Windows
  • 86. © Microsoft Corporation Windowing Functions Hopping Windows and Session Windows Hopping Windows Session Windows
  • 87. © Microsoft Corporation Lab 5: Ingest and Analyse real-time data with Event Hubs and Stream Analytics
  • 88. © Microsoft Corporation Ingest and Analyse real-time data with Event Hubs and Stream Analytics Lab 5 Semi-Structured V=Volume csv, logs, json, xml (loosely-typed) Non-structured V=Variety images, video, audio, free text (no structure) Stream V=Velocity IoT devices, sensors, gadgets (loosely-typed) Load and Ingest Store Process Serve Data Factory Event Hubs Stream Analytics Relational Databases (strongly-typed, structured) Azure Synapse Analytics Power BI Premium Databricks Azure Data Lake Gen2 CosmosDB Power BI Premium Application Analytics Analytics λ Lambda Architecture Hot Path Real Time Analytics Cold Path History and Trend Analysis Stream Datasets and Real-time Dashboards Enterprise-grade semantic model Integrate big data scenarios with traditional data warehouse Cognitive Services Azure ML Business User Fast load data with Polybase/ ParquetDirect Build and Score ML models Scheduled / event- triggered data ingestion
  • 89. © Microsoft Corporation Lab Architecture Lab 5 Azure Data Platform Resource Group ADPDesktop Virtual Machine ADPVirtualNetwork Virtual Network MDWResources Storage Account ADPLogicApp Logic App SynapseDataLakesuffix Azure Data Lake Storage Gen2 ADPEventHubs-suffix Event Hubs SynapseStreamAnalytics-suffix Event Hubs SynapseDataFactory-suffix Azure Data Factory ADPDatabricks Azure Databricks ADPComputerVision Computer Vision API ADPCosmosDB-suffix Azure CosmosDB PowerBI Power BI Desktop/ Workspace SynapseDataFactory-suffix Azure Data Factory 1 2 3 4 1 1 2 2 3 4 5 6 1 1 2 3 3 4 Student s Computer operationalsql-suffixNYCDataSets Azure SQL Database 2 synapsesql-suffixSynapseDW Azure Synapse Analytics 5 4 RDP Connection or Azure Bastion
  • 91. © Copyright Microsoft Corporation. All rights reserved.