8. AZURE SQL DATA WAREHOUSE
AZURE SQL DATABASE
DATA MIGRATION SERVICE
DATA MIGRATION SERVICE
DATA MIGRATION SERVICE
DATA MIGRATION SERVICE
AZURE ANALYSIS SERVICES
BUSINESS APPS
CUSTOM APPS
CUSTOM APPS
BUSINESS APPS
ANALYTICAL DASHBOARDS
Scenario 1
9. SQL Data Warehouse
An illustration
Relational Data
. . . Blobs, Azure
Data Lake Store
Binary
Data
10001110110101111011
1101010101010111100
000101010101010110
0000111100111
Poly
Base
Clients
Excel
Power
BI
Tableau
. . .
Transact-SQL Query
. . .ComputeComputeCompute
10. AZURE CLI, AZURE DATA FACTORY
DATA MIGRATION SERVICE
AZURE SQL DATA WAREHOUSE ANALYTICAL DASHBOARDSAZURE ANALYSIS SERVICES
Scenario 2
11. New Pipeline Model
Rich pipeline orchestration
Triggers – ondemand, schedule, event
Data Movement as a
Service
Cloud, Hybrid
30 connectors provided
SSIS Package Execution
In a managed cloud environment
Use familiar tools, SSMS & SSDT
Author & Monitor
Programmability (Python, .NET, Powershell, etc)
Visual Tools (coming soon)
Stored Procedures
Hadoop on Azure
Trusted data
BI & analyticsData Lake Analytics
Custom Code
Machine Learning
12. Category Data store Supported as source Supported as sink
Azure
Azure Data Lake Store
Azure Blob storage
Azure SQL Database
Azure SQL Data Warehouse
Azure Table storage
Azure DocumentDB
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
Databases
SQL Server*
Oracle*
MySQL*
DB2*
Teradata*
PostgreSQL*
Sybase*
Cassandra*
MongoDB*
Amazon Redshift
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
File
File System*
HDFS*
Amazon S3
✓
✓
✓
✓
Others
Salesforce
Generic ODBC*
Generic OData
Web Table (table from HTML)
GE Historian*
✓
✓
✓
✓
✓
13.
14. AZURE CLI, AZURE DATA FACTORY
DATA MIGRATION SERVICE
AZURE SQL DATA WAREHOUSE ANALYTICAL DASHBOARDSAZURE ANALYSIS SERVICES
ExpressRoute
Scenario 3
17. No limits to SCALE
Store ANY DATA in its native format
HADOOP FILE SYSTEM (HDFS) for the
cloud
Optimized for analytics workload
PERFORMANCE
ENTERPRISE GRADE access control,
encryption at rest
A hyper scale repository for big
data analytics workloads
18. Map reduce
HBase
transactions
Any HDFS applicationHive query
Azure HDInsight
Hadoop WebHDFS client
Azure Data Lake Store
WebHDFS-compatible REST API
Spark queries
21. Capability ADLS Azure Blob
Purpose Optimized for Analytics
Analysis using Batch, Interactive, Streaming, ML
General purpose storage scenarios
App backend, backup data, media storage for
streaming, log files, IoT telemetry, Big Data
analytics
Geographic Availability East US 2, Central US, North Europe All Data Centers
HDFS Yes (Web HDFS) No
Scale No Limit on Bandwidth or Storage size Limits
-5PB Storage (announced)
-50GBps Bandwidth
Authentication & Authorization Azure Active Directory
POSIX ACLs on Files and Folders
Access keys & SAS tokens
Structure Accounts / Folders / Files (with Hierarchical
folders)
Accounts / Containers / Blobs (flat namespace)
Encryption Yes Yes
Geo- Replication No Yes [LRS, GRS, RA-GRS]
Cost [1PB] $40K
Coming soon
HOT $20K
COOL $16K
24. Azure Batch
Enable applications and algorithms
to easily and efficiently run in
parallel at scale
Rendering
Media transcoding & pre-/post-
processing
Test execution
Monte Carlo simulations
Genomics
Deep Learning
OCR
Data ingestion, processing, ETL
R at scale
Compiled MATLAB
Engineering simulations
Image analysis & processing
26. Azure Batch Rendering GA
Queue
Upload assets
Submit job
Return outputs
Pay-per-minute
licensing
Windows and Linux VMs
Autodesk Maya
Plug-in
Batch Labs
x-plat client
Azure CLI /
PowerShell APIs
Monitor job
27. ANALYTICAL DASHBOARDS
AZURE SQL DATA WAREHOUSE
DATA FACTORY
DATA FACTORY
AZURE DATA LAKE STORE AZURE DATA LAKE ANALYTICS
Scenario 7
28. Data Lake Analytics Workloads
With BATCH workload, Data Lake Analytics is ideal for
• The transformation and preparation of data for use in other systems
• Analytics on VERY LARGE amounts of data
• Massively Parallel programs written in .NET, Python and R, scaled out with U-
SQL
• Performing Cognition at Scale on large collections
29. Data Lake Analytics
Data Lake Store
An illustration
U-SQL Query
. . .ComputeComputeCompute
Unstructured Data
. . .
30. U-SQL
Query
Query
Azure
Storage Blobs
Azure SQL
in VMs
Azure
SQL DB
Azure Data
Lake Analytics
Azure
SQL Data Warehouse
Azure
Data Lake Storage
Easily query data in multiple Azure data stores
without moving it to a single store
31. Embedded Artificial Intelligence
Host Deep Neural Networks (DNNs)
6 Built-in Cognitive Functions
– Face API
– Image Tagging
– Emotion analysis
– OCR
– Text Key Phrase Extraction
– Text Sentiment Analysis
34. ANALYTICAL DASHBOARDS
AZURE SQL DATA WAREHOUSE
DATA FACTORY
DATA FACTORY
AZURE DATA LAKE STORE AZURE DATA LAKE ANALYTICS
Scenario 8
AZURE DATA LAKE ANALYTICS
Cleansing Analysis
35. Orchestration Key ManagementPrivate Connections Monitoring
AZURE EXPRESSROUTE AZURE DATA FACTORY AZURE KEY VAULT OPERATIONS MANAGEMENT SUITE
ANALYTICAL DASHBOARDS
AZURE SQL DATA WAREHOUSE
DATA FACTORY
DATA FACTORY
AZURE DATA LAKE STORE AZURE DATA LAKE ANALYTICS WEB & MOBILE APPS
AZURE STREAM ANALYTICS
Scenario 9
Azure Time Series Insights
36. Store and manage terabytes of time-series data
Explore and visualize billions of events simultaneously
Conduct root-cause analysis, and to compare multiple sites and assets
37. Illustrating an application
Stream Analytics
Time
Window
SELECT …
Written in Stream Analytics
Query Language, a subset
of T-SQL
Stream
A standing
query
38. Orchestration Key ManagementPrivate Connections Monitoring
AZURE EXPRESSROUTE AZURE DATA FACTORY AZURE KEY VAULT OPERATIONS MANAGEMENT SUITE
ANALYTICAL DASHBOARDS
AZURE SQL DATA WAREHOUSE
DATA FACTORY
DATA FACTORY
AZURE DATA LAKE STORE AZURE DATA LAKE ANALYTICS COSMOS DB WEB & MOBILE APPS
AZURE STREAM ANALYTICS
Scenario 10
39.
40. Orchestration Key ManagementPrivate Connections Monitoring
AZURE EXPRESSROUTE AZURE DATA FACTORY AZURE KEY VAULT OPERATIONS MANAGEMENT SUITE
ANALYTICAL DASHBOARDS
AZURE SQL DATA WAREHOUSE
DATA FACTORY
DATA FACTORY
AZURE MACHINE LEARNING & MACHINE LEARNING SERVER
AZURE DATA LAKE STORE AZURE DATA LAKE ANALYTICS COSMOS DB WEB & MOBILE APPS
AZURE STREAM ANALYTICS
Scenario 11
44. MICROSOFTAZURE
Model
Call Center Staff
Call Center
ApplicationBlobsDetailed
Call Data
ONPREMISES
CRM
Data
Data
for ML
Aggregated
Call Data
ADLA Azure ML
Azure Data Factory
Need a real-time
prediction of each caller’s
propensity to churn
Model is rebuil
and redeployed
regularly
45. Orchestration Key ManagementPrivate Connections Monitoring
AZURE EXPRESSROUTE AZURE DATA FACTORY AZURE KEY VAULT OPERATIONS MANAGEMENT SUITE
AZURE SQL DATA WAREHOUSE
DATA FACTORY
DATA FACTORY
AZURE MACHINE LEARNING & MACHINE LEARNING SERVER
AZURE DATA LAKE STORE AZURE DATA LAKE ANALYTICS COSMOS DB WEB & MOBILE APPS
AZURE STREAM ANALYTICS
Scenario 12
Power BI
Power BI
Embedded
48. Power BI
Users
Permissions
Auth. providers
API keys
Token
+ Claim: Can view Report 1
+ Expiration: 5 minutes
User requests to view
Report 1
Validate token
API keys
Report 2
Workspace
Report 1
Application
Provide seamless authentication experiences
49. Provide seamless authentication experiences
Power BI
Users
Permissions
Auth. providers
API keys API keys
Report 2
Workspace
Report 1Report 1
Application
51. Users
Application
Permissions
Auth. providers
Power BI
API keys
Report 2
Workspace
Report 1
Token
+ Claim: Can view Report 1
+ Expiration: 5 minutes
+ username: “user1”
+ roles: “sales”
API keys
Copy API keys to your application
Sign token
Provide seamless authentication experiences
52. Power BI REST API
Authentication flow: Web application
53.
54. FAQ
• What is a report session and how is it billed?
• A session is a set of interactions between an end user and a Power BI Embedded report.
Each time a Power BI Embedded report is displayed to a user, a session is initiated and the
subscription holder will be charged for a session. Sessions are billed at a flat rate,
independent of the number of visual elements in a report or how frequently the report
content is refreshed. A session ends when either the user closes the report, or the session
times out after one hour.
• Do you offer any tools or guidance to help me estimate how many renders/session I
should expect? How will I know how many renders have been completed?
• The Azure Portal will provide billing details on how many renders / report sessions have
been performed against your subscription.
• Do I need a Power BI subscription in order to develop applications with Power BI
Embedded? How do I get started?
• As the application developer, you do not need to have a Power BI subscription in order to
create the reports and visualizations you wish to use in your application. You will need a
Microsoft Azure subscription and the free Power BI Desktop application.
55. Orchestration Key ManagementPrivate Connections Monitoring
AZURE EXPRESSROUTE AZURE DATA FACTORY AZURE KEY VAULT OPERATIONS MANAGEMENT SUITE
AZURE SQL DATA WAREHOUSE
DATA FACTORY
DATA FACTORY
AZURE MACHINE LEARNING & MACHINE LEARNING SERVER
AZURE DATA LAKE STORE AZURE DATA LAKE ANALYTICS COSMOS DB WEB & MOBILE APPS
AZURE STREAM ANALYTICS
Scenario 13
Power BI
COGNITIVE SERVICESBOT SERVICE Logic App
56. Orchestration Key ManagementPrivate Connections Monitoring
AZURE EXPRESSROUTE AZURE DATA FACTORY AZURE KEY VAULT OPERATIONS MANAGEMENT SUITE
AZURE SQL DATA WAREHOUSE
DATA FACTORY
DATA FACTORY
AZURE MACHINE LEARNING & MACHINE LEARNING SERVER
AZURE DATA LAKE STORE WEB & MOBILE APPS
Scenario 14
ANALYTICAL DASHBOARDS
AZURE HDINSIGHT
(Hadoop/Hive)
AZURE HDINSIGHT
(Hadoop/Storm)
AZURE HDINSIGHT
(Hadoop/Kafka)
Kafka
AZURE HDINSIGHT
(Hadoop/HBase)
COGNITIVE SERVICESBOT SERVICE Logic App
61. Multi Region Availability
Available in >25 regions world-wide
Launched most recently in US West 2, and UK
regions
Available in China, Europe and US
Government clouds
62. IaaS Clusters Managed Clusters Big Data as-a-service
Best for…
Workloads
Administrative
Developer
Control &
configuration
Service Level
Agreement
TCO
CONTROL EASE OF USE AND ADOPTION
63. Orchestration Key ManagementPrivate Connections Monitoring
AZURE EXPRESSROUTE AZURE DATA FACTORY AZURE KEY VAULT OPERATIONS MANAGEMENT SUITE
AZURE SQL DATA WAREHOUSE
DATA FACTORY
DATA FACTORY
AZURE DATA LAKE STORE
Scenario 14
ANALYTICAL DASHBOARDS
AZURE HDINSIGHT
(Hadoop/Hive)
AZURE HDINSIGHT
(Hadoop/Storm)
AZURE HDINSIGHT
(Hadoop/Kafka)
Kafka
AZURE HDINSIGHT
(Hadoop/R)
Jupyter
Data Science
Notebooks
AZURE HDINSIGHT
(Hadoop/Spark)
64. Community Algorithms
Spark ML (PySpark, SparkR)
Caffe on Spark
BigDL on HDInsight
SparklyR
XGBoost
Supported by community
ISV Applications
H2O
Dataiku
Supported by ISV
65.
66. Orchestration Key ManagementPrivate Connections Monitoring
AZURE EXPRESSROUTE AZURE DATA FACTORY AZURE KEY VAULT OPERATIONS MANAGEMENT SUITE
AZURE SQL DATA WAREHOUSE
DATA FACTORY
DATA FACTORY
AZURE DATA LAKE STORE
Scenario 15
ANALYTICAL DASHBOARDS
AZURE HDINSIGHT
(Hadoop/Hive)
AZURE HDINSIGHT
(Hadoop/Storm)
AZURE HDINSIGHT
(Hadoop/Kafka)
Kafka
AZURE HDINSIGHT
(Hadoop/R)
Jupyter
Data Science
Notebooks
AZURE HDINSIGHT
(Hadoop/Spark)
DATA CATALOG
67. Analyze
Enabling the Entire Enterprise Data Ecosystem
• Search
• Browse
• Filter
Discover
• Metadata
• Experts
• Context
Understand
• Your data
• Your tools
• Your way
Consume
• Tag
• Document
• Publish
Contribute
Notes:
Web jobs can be used for streaming processing when set to continuous, functions can only be triggered or scheduled so they are not suitable.
In some cases logic apps might fit for orchestrating specific tasks
Azure Data Factor and Oozie are the main orchestrators offered in Azure
Apache Oozie is a Java web application that does workflow coordination for Hadoop jobs. In Oozie, a workflow is defined as directed acyclic graphs (DAGs) of actions. It supports different types of Hadoop jobs, such as MapReduce, Streaming, Pig, Hive, Sqoop, and more. Not only these, but also system-specific jobs, such as shell scripts and Java programs.
Apache Sqoop is a tool to transfer bulk data to and from Hadoop and relational databases as efficiently as possible. It is used to import data from relational database management systems (RDBMS)— such as Oracle, MySQL, SQL Server, or any other structured relational database—and into the HDFS. It then does processing and/or transformation on the data using Hive or MapReduce, and then exports the data back to the RDBMS.
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Add key for the coluors
Notes:
Web jobs can be used for streaming processing when set to continuous, functions can only be triggered or scheduled so they are not suitable.
In some cases logic apps might fit for orchestrating specific tasks
Azure Data Factor and Oozie are the main orchestrators offered in Azure
Apache Oozie is a Java web application that does workflow coordination for Hadoop jobs. In Oozie, a workflow is defined as directed acyclic graphs (DAGs) of actions. It supports different types of Hadoop jobs, such as MapReduce, Streaming, Pig, Hive, Sqoop, and more. Not only these, but also system-specific jobs, such as shell scripts and Java programs.
Apache Sqoop is a tool to transfer bulk data to and from Hadoop and relational databases as efficiently as possible. It is used to import data from relational database management systems (RDBMS)— such as Oracle, MySQL, SQL Server, or any other structured relational database—and into the HDFS. It then does processing and/or transformation on the data using Hive or MapReduce, and then exports the data back to the RDBMS.