Designing Big Data Analytics
Solutions on Azure
Mohamed Tawfik
Cloud Solutions Architect
Azure CoE - EMEA
The 4 Industrial Revolutions (by Christoph Roser at AllAboutLean.com)
Azure Data Landscape
Source: Mastering Azure Analytics, 1st Edition - Zoiner
Tejada, O'Reilly Media, Inc., April 2017
Architecting Big Data Solutions on Azure:
Custom Scenarios & Patterns
AZURE SQL DATA WAREHOUSE
AZURE SQL DATABASE
DATA MIGRATION SERVICE
DATA MIGRATION SERVICE
DATA MIGRATION SERVICE
DATA MIGRATION SERVICE
AZURE ANALYSIS SERVICES
BUSINESS APPS
CUSTOM APPS
CUSTOM APPS
BUSINESS APPS
ANALYTICAL DASHBOARDS
Scenario 1
SQL Data Warehouse
An illustration
Relational Data
. . . Blobs, Azure
Data Lake Store
Binary
Data
10001110110101111011
1101010101010111100
000101010101010110
0000111100111
Poly
Base
Clients
Excel
Power
BI
Tableau
. . .
Transact-SQL Query
. . .ComputeComputeCompute
AZURE CLI, AZURE DATA FACTORY
DATA MIGRATION SERVICE
AZURE SQL DATA WAREHOUSE ANALYTICAL DASHBOARDSAZURE ANALYSIS SERVICES
Scenario 2
New Pipeline Model
Rich pipeline orchestration
Triggers – ondemand, schedule, event
Data Movement as a
Service
Cloud, Hybrid
30 connectors provided
SSIS Package Execution
In a managed cloud environment
Use familiar tools, SSMS & SSDT
Author & Monitor
Programmability (Python, .NET, Powershell, etc)
Visual Tools (coming soon)
Stored Procedures
Hadoop on Azure
Trusted data
BI & analyticsData Lake Analytics
Custom Code
Machine Learning
Category Data store Supported as source Supported as sink
Azure
Azure Data Lake Store
Azure Blob storage
Azure SQL Database
Azure SQL Data Warehouse
Azure Table storage
Azure DocumentDB
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
Databases
SQL Server*
Oracle*
MySQL*
DB2*
Teradata*
PostgreSQL*
Sybase*
Cassandra*
MongoDB*
Amazon Redshift
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
File
File System*
HDFS*
Amazon S3
✓
✓
✓
✓
Others
Salesforce
Generic ODBC*
Generic OData
Web Table (table from HTML)
GE Historian*
✓
✓
✓
✓
✓
AZURE CLI, AZURE DATA FACTORY
DATA MIGRATION SERVICE
AZURE SQL DATA WAREHOUSE ANALYTICAL DASHBOARDSAZURE ANALYSIS SERVICES
ExpressRoute
Scenario 3
AZURE STORAGE
Polybase
ANALYTICAL DASHBOARDS
AZURE SQL DATA WAREHOUSE
DATA FACTORY
DATA FACTORY
AZURE FUNCTIONS
Scenario 4
ANALYTICAL DASHBOARDS
AZURE SQL DATA WAREHOUSE
DATA FACTORY
DATA FACTORY
AZURE DATA LAKE STORE
Scenario 5
AZURE FUNCTIONS
No limits to SCALE
Store ANY DATA in its native format
HADOOP FILE SYSTEM (HDFS) for the
cloud
Optimized for analytics workload
PERFORMANCE
ENTERPRISE GRADE access control,
encryption at rest
A hyper scale repository for big
data analytics workloads
Map reduce
HBase
transactions
Any HDFS applicationHive query
Azure HDInsight
Hadoop WebHDFS client
Azure Data Lake Store
WebHDFS-compatible REST API
Spark queries
Enterprise grade security
ADL .NET SDK
ADL
PowerShell
ADL
XPlat CLI
ADL Node.js SDK ADL Java SDK ADL Python*
Your application
Azure and ADL Store REST APIs
Capability ADLS Azure Blob
Purpose Optimized for Analytics
Analysis using Batch, Interactive, Streaming, ML
General purpose storage scenarios
App backend, backup data, media storage for
streaming, log files, IoT telemetry, Big Data
analytics
Geographic Availability East US 2, Central US, North Europe All Data Centers
HDFS Yes (Web HDFS) No
Scale No Limit on Bandwidth or Storage size Limits
-5PB Storage (announced)
-50GBps Bandwidth
Authentication & Authorization Azure Active Directory
POSIX ACLs on Files and Folders
Access keys & SAS tokens
Structure Accounts / Folders / Files (with Hierarchical
folders)
Accounts / Containers / Blobs (flat namespace)
Encryption Yes Yes
Geo- Replication No Yes [LRS, GRS, RA-GRS]
Cost [1PB] $40K
Coming soon
HOT $20K
COOL $16K
LOB
Applications
SocialDevices
Clickstream
Sensors
Video
Web
Relational
A highly scalable, distributed, parallel file system in the cloud specifically designed to work
with a variety of big data analytics workloads
Azure Data Lake Store
Batch
Map
Reduce
Script
Pig
SQL
Hive
NoSQL
HBase
In-Memory
Spark
Predictive
R Server
Batch
U-SQL
HDInsight
ADL
Analytics
ANALYTICAL DASHBOARDS
AZURE SQL DATA WAREHOUSE
DATA FACTORY
DATA FACTORY
AZURE DATA LAKE STORE
Scenario 6
Azure Batch
Enable applications and algorithms
to easily and efficiently run in
parallel at scale
Rendering
Media transcoding & pre-/post-
processing
Test execution
Monte Carlo simulations
Genomics
Deep Learning
OCR
Data ingestion, processing, ETL
R at scale
Compiled MATLAB
Engineering simulations
Image analysis & processing
INPUT OUTPUT
Azure Batch
Concepts
Applications /
Algorithms
Queue
Pool of VMs
Jobs &
Tasks
Azure Batch Rendering GA
Queue
Upload assets
Submit job
Return outputs
Pay-per-minute
licensing
Windows and Linux VMs
Autodesk Maya
Plug-in
Batch Labs
x-plat client
Azure CLI /
PowerShell APIs
Monitor job
ANALYTICAL DASHBOARDS
AZURE SQL DATA WAREHOUSE
DATA FACTORY
DATA FACTORY
AZURE DATA LAKE STORE AZURE DATA LAKE ANALYTICS
Scenario 7
Data Lake Analytics Workloads
With BATCH workload, Data Lake Analytics is ideal for
• The transformation and preparation of data for use in other systems
• Analytics on VERY LARGE amounts of data
• Massively Parallel programs written in .NET, Python and R, scaled out with U-
SQL
• Performing Cognition at Scale on large collections
Data Lake Analytics
Data Lake Store
An illustration
U-SQL Query
. . .ComputeComputeCompute
Unstructured Data
. . .
U-SQL
Query
Query
Azure
Storage Blobs
Azure SQL
in VMs
Azure
SQL DB
Azure Data
Lake Analytics
Azure
SQL Data Warehouse
Azure
Data Lake Storage
Easily query data in multiple Azure data stores
without moving it to a single store
Embedded Artificial Intelligence
Host Deep Neural Networks (DNNs)
6 Built-in Cognitive Functions
– Face API
– Image Tagging
– Emotion analysis
– OCR
– Text Key Phrase Extraction
– Text Sentiment Analysis
Extract
Process
Output
User CodeUser Code
User Code
User Code
Declarative Framework
User Extensions
U-SQL Example
Extract
User Code
User Code
U-SQL
Declarative
+
Imperative
Structured
+
Semi-structured
+
Unstructured
Batch
+
Interactive
+
Streaming
+
Machine Learning
Programming models Data Workloads
a language that unifies
ANALYTICAL DASHBOARDS
AZURE SQL DATA WAREHOUSE
DATA FACTORY
DATA FACTORY
AZURE DATA LAKE STORE AZURE DATA LAKE ANALYTICS
Scenario 8
AZURE DATA LAKE ANALYTICS
Cleansing Analysis
Orchestration Key ManagementPrivate Connections Monitoring
AZURE EXPRESSROUTE AZURE DATA FACTORY AZURE KEY VAULT OPERATIONS MANAGEMENT SUITE
ANALYTICAL DASHBOARDS
AZURE SQL DATA WAREHOUSE
DATA FACTORY
DATA FACTORY
AZURE DATA LAKE STORE AZURE DATA LAKE ANALYTICS WEB & MOBILE APPS
AZURE STREAM ANALYTICS
Scenario 9
Azure Time Series Insights
Store and manage terabytes of time-series data
Explore and visualize billions of events simultaneously
Conduct root-cause analysis, and to compare multiple sites and assets
Illustrating an application
Stream Analytics
Time
Window
SELECT …
Written in Stream Analytics
Query Language, a subset
of T-SQL
Stream
A standing
query
Orchestration Key ManagementPrivate Connections Monitoring
AZURE EXPRESSROUTE AZURE DATA FACTORY AZURE KEY VAULT OPERATIONS MANAGEMENT SUITE
ANALYTICAL DASHBOARDS
AZURE SQL DATA WAREHOUSE
DATA FACTORY
DATA FACTORY
AZURE DATA LAKE STORE AZURE DATA LAKE ANALYTICS COSMOS DB WEB & MOBILE APPS
AZURE STREAM ANALYTICS
Scenario 10
Orchestration Key ManagementPrivate Connections Monitoring
AZURE EXPRESSROUTE AZURE DATA FACTORY AZURE KEY VAULT OPERATIONS MANAGEMENT SUITE
ANALYTICAL DASHBOARDS
AZURE SQL DATA WAREHOUSE
DATA FACTORY
DATA FACTORY
AZURE MACHINE LEARNING & MACHINE LEARNING SERVER
AZURE DATA LAKE STORE AZURE DATA LAKE ANALYTICS COSMOS DB WEB & MOBILE APPS
AZURE STREAM ANALYTICS
Scenario 11
$2,600.45
$2,294.58
$1,003.30
$8,488.32
Name Amount Fraudulent
Smith
Janet
John
Adams
No
Yes
Yes
No
What’s the pattern for
fraudulent
transactions?
$2,600.45
$2,294.58
$1,003.30
$8,488.32
Name Amount Fraudulent
Smith
Janet
John
Adams
No
Yes
Yes
No
Where
Issued
Where
Used
Age of
Cardholder
$200.12
$3,250.11
$8,156.20
$7,475.11
Pali
Jones
Hanford
Marx
USA
USA
USA
FRA
AUS
USA
USA
UK
22
29
25
64
58
43
27
32
No
No
Yes
No
USA
RUS
RUS
USA
JAP
RUS
RUS
GER
$540.00
$7,475.11
Norse
Edson
USA
USA
27
20
No
Yes
RUS
RUS
What’s the pattern
for fraudulent
transactions?
Illustrating the process
MICROSOFTAZURE
Model
Call Center Staff
Call Center
ApplicationBlobsDetailed
Call Data
ONPREMISES
CRM
Data
Data
for ML
Aggregated
Call Data
ADLA Azure ML
Azure Data Factory
Need a real-time
prediction of each caller’s
propensity to churn
Model is rebuil
and redeployed
regularly
Orchestration Key ManagementPrivate Connections Monitoring
AZURE EXPRESSROUTE AZURE DATA FACTORY AZURE KEY VAULT OPERATIONS MANAGEMENT SUITE
AZURE SQL DATA WAREHOUSE
DATA FACTORY
DATA FACTORY
AZURE MACHINE LEARNING & MACHINE LEARNING SERVER
AZURE DATA LAKE STORE AZURE DATA LAKE ANALYTICS COSMOS DB WEB & MOBILE APPS
AZURE STREAM ANALYTICS
Scenario 12
Power BI
Power BI
Embedded
http://bit.ly/pbie
Microsoft Azure
subscription
Embed
End users
Workspace
Workspace collection
1,N
Developer
Name
Admin Users
Endpoints
Keys
Gateways
Credentials
Geo Location
Tags
Name
Reports
Datasets
Tags
Your app
Azure SQL
Data Warehouse
Azure SQL Database
1,N
1,N
Power BI
Users
Permissions
Auth. providers
API keys
Token
+ Claim: Can view Report 1
+ Expiration: 5 minutes
User requests to view
Report 1
Validate token
API keys
Report 2
Workspace
Report 1
Application
Provide seamless authentication experiences
Provide seamless authentication experiences
Power BI
Users
Permissions
Auth. providers
API keys API keys
Report 2
Workspace
Report 1Report 1
Application
Row Level Security
Users
Application
Permissions
Auth. providers
Power BI
API keys
Report 2
Workspace
Report 1
Token
+ Claim: Can view Report 1
+ Expiration: 5 minutes
+ username: “user1”
+ roles: “sales”
API keys
Copy API keys to your application
Sign token
Provide seamless authentication experiences
Power BI REST API
Authentication flow: Web application
FAQ
• What is a report session and how is it billed?
• A session is a set of interactions between an end user and a Power BI Embedded report.
Each time a Power BI Embedded report is displayed to a user, a session is initiated and the
subscription holder will be charged for a session. Sessions are billed at a flat rate,
independent of the number of visual elements in a report or how frequently the report
content is refreshed. A session ends when either the user closes the report, or the session
times out after one hour.
• Do you offer any tools or guidance to help me estimate how many renders/session I
should expect? How will I know how many renders have been completed?
• The Azure Portal will provide billing details on how many renders / report sessions have
been performed against your subscription.
• Do I need a Power BI subscription in order to develop applications with Power BI
Embedded? How do I get started?
• As the application developer, you do not need to have a Power BI subscription in order to
create the reports and visualizations you wish to use in your application. You will need a
Microsoft Azure subscription and the free Power BI Desktop application.
Orchestration Key ManagementPrivate Connections Monitoring
AZURE EXPRESSROUTE AZURE DATA FACTORY AZURE KEY VAULT OPERATIONS MANAGEMENT SUITE
AZURE SQL DATA WAREHOUSE
DATA FACTORY
DATA FACTORY
AZURE MACHINE LEARNING & MACHINE LEARNING SERVER
AZURE DATA LAKE STORE AZURE DATA LAKE ANALYTICS COSMOS DB WEB & MOBILE APPS
AZURE STREAM ANALYTICS
Scenario 13
Power BI
COGNITIVE SERVICESBOT SERVICE Logic App
Orchestration Key ManagementPrivate Connections Monitoring
AZURE EXPRESSROUTE AZURE DATA FACTORY AZURE KEY VAULT OPERATIONS MANAGEMENT SUITE
AZURE SQL DATA WAREHOUSE
DATA FACTORY
DATA FACTORY
AZURE MACHINE LEARNING & MACHINE LEARNING SERVER
AZURE DATA LAKE STORE WEB & MOBILE APPS
Scenario 14
ANALYTICAL DASHBOARDS
AZURE HDINSIGHT
(Hadoop/Hive)
AZURE HDINSIGHT
(Hadoop/Storm)
AZURE HDINSIGHT
(Hadoop/Kafka)
Kafka
AZURE HDINSIGHT
(Hadoop/HBase)
COGNITIVE SERVICESBOT SERVICE Logic App
Clusters
Microsoft Azure Datacenter
HDInsight Cluster
VMVMVMVMVMVMVMVMVMVMVMVM
Created through the
Azure portal
Microsoft Hadoop Stack
Azure HDInsight
Machine
Learning
Local (HDFS) or Cloud (Azure Blob/Azure Data Lake Store)
Open source analytics
service for the Enterprise
Multi Region Availability
Available in >25 regions world-wide
Launched most recently in US West 2, and UK
regions
Available in China, Europe and US
Government clouds
IaaS Clusters Managed Clusters Big Data as-a-service
Best for…
Workloads
Administrative
Developer
Control &
configuration
Service Level
Agreement
TCO
CONTROL EASE OF USE AND ADOPTION
Orchestration Key ManagementPrivate Connections Monitoring
AZURE EXPRESSROUTE AZURE DATA FACTORY AZURE KEY VAULT OPERATIONS MANAGEMENT SUITE
AZURE SQL DATA WAREHOUSE
DATA FACTORY
DATA FACTORY
AZURE DATA LAKE STORE
Scenario 14
ANALYTICAL DASHBOARDS
AZURE HDINSIGHT
(Hadoop/Hive)
AZURE HDINSIGHT
(Hadoop/Storm)
AZURE HDINSIGHT
(Hadoop/Kafka)
Kafka
AZURE HDINSIGHT
(Hadoop/R)
Jupyter
Data Science
Notebooks
AZURE HDINSIGHT
(Hadoop/Spark)
Community Algorithms
Spark ML (PySpark, SparkR)
Caffe on Spark
BigDL on HDInsight
SparklyR
XGBoost
Supported by community
ISV Applications
H2O
Dataiku
Supported by ISV
Orchestration Key ManagementPrivate Connections Monitoring
AZURE EXPRESSROUTE AZURE DATA FACTORY AZURE KEY VAULT OPERATIONS MANAGEMENT SUITE
AZURE SQL DATA WAREHOUSE
DATA FACTORY
DATA FACTORY
AZURE DATA LAKE STORE
Scenario 15
ANALYTICAL DASHBOARDS
AZURE HDINSIGHT
(Hadoop/Hive)
AZURE HDINSIGHT
(Hadoop/Storm)
AZURE HDINSIGHT
(Hadoop/Kafka)
Kafka
AZURE HDINSIGHT
(Hadoop/R)
Jupyter
Data Science
Notebooks
AZURE HDINSIGHT
(Hadoop/Spark)
DATA CATALOG
Analyze
Enabling the Entire Enterprise Data Ecosystem
• Search
• Browse
• Filter
Discover
• Metadata
• Experts
• Context
Understand
• Your data
• Your tools
• Your way
Consume
• Tag
• Document
• Publish
Contribute
Source: Mastering Azure Analytics, 1st Edition - Zoiner
Tejada, O'Reilly Media, Inc., April 2017
Thank You
Mohamed Tawfik
Cloud Solutions Architect
Azure CoE - EMEA

Designing big data analytics solutions on azure

  • 1.
    Designing Big DataAnalytics Solutions on Azure Mohamed Tawfik Cloud Solutions Architect Azure CoE - EMEA
  • 3.
    The 4 IndustrialRevolutions (by Christoph Roser at AllAboutLean.com)
  • 5.
  • 6.
    Source: Mastering AzureAnalytics, 1st Edition - Zoiner Tejada, O'Reilly Media, Inc., April 2017
  • 7.
    Architecting Big DataSolutions on Azure: Custom Scenarios & Patterns
  • 8.
    AZURE SQL DATAWAREHOUSE AZURE SQL DATABASE DATA MIGRATION SERVICE DATA MIGRATION SERVICE DATA MIGRATION SERVICE DATA MIGRATION SERVICE AZURE ANALYSIS SERVICES BUSINESS APPS CUSTOM APPS CUSTOM APPS BUSINESS APPS ANALYTICAL DASHBOARDS Scenario 1
  • 9.
    SQL Data Warehouse Anillustration Relational Data . . . Blobs, Azure Data Lake Store Binary Data 10001110110101111011 1101010101010111100 000101010101010110 0000111100111 Poly Base Clients Excel Power BI Tableau . . . Transact-SQL Query . . .ComputeComputeCompute
  • 10.
    AZURE CLI, AZUREDATA FACTORY DATA MIGRATION SERVICE AZURE SQL DATA WAREHOUSE ANALYTICAL DASHBOARDSAZURE ANALYSIS SERVICES Scenario 2
  • 11.
    New Pipeline Model Richpipeline orchestration Triggers – ondemand, schedule, event Data Movement as a Service Cloud, Hybrid 30 connectors provided SSIS Package Execution In a managed cloud environment Use familiar tools, SSMS & SSDT Author & Monitor Programmability (Python, .NET, Powershell, etc) Visual Tools (coming soon) Stored Procedures Hadoop on Azure Trusted data BI & analyticsData Lake Analytics Custom Code Machine Learning
  • 12.
    Category Data storeSupported as source Supported as sink Azure Azure Data Lake Store Azure Blob storage Azure SQL Database Azure SQL Data Warehouse Azure Table storage Azure DocumentDB ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ Databases SQL Server* Oracle* MySQL* DB2* Teradata* PostgreSQL* Sybase* Cassandra* MongoDB* Amazon Redshift ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ File File System* HDFS* Amazon S3 ✓ ✓ ✓ ✓ Others Salesforce Generic ODBC* Generic OData Web Table (table from HTML) GE Historian* ✓ ✓ ✓ ✓ ✓
  • 14.
    AZURE CLI, AZUREDATA FACTORY DATA MIGRATION SERVICE AZURE SQL DATA WAREHOUSE ANALYTICAL DASHBOARDSAZURE ANALYSIS SERVICES ExpressRoute Scenario 3
  • 15.
    AZURE STORAGE Polybase ANALYTICAL DASHBOARDS AZURESQL DATA WAREHOUSE DATA FACTORY DATA FACTORY AZURE FUNCTIONS Scenario 4
  • 16.
    ANALYTICAL DASHBOARDS AZURE SQLDATA WAREHOUSE DATA FACTORY DATA FACTORY AZURE DATA LAKE STORE Scenario 5 AZURE FUNCTIONS
  • 17.
    No limits toSCALE Store ANY DATA in its native format HADOOP FILE SYSTEM (HDFS) for the cloud Optimized for analytics workload PERFORMANCE ENTERPRISE GRADE access control, encryption at rest A hyper scale repository for big data analytics workloads
  • 18.
    Map reduce HBase transactions Any HDFSapplicationHive query Azure HDInsight Hadoop WebHDFS client Azure Data Lake Store WebHDFS-compatible REST API Spark queries
  • 19.
  • 20.
    ADL .NET SDK ADL PowerShell ADL XPlatCLI ADL Node.js SDK ADL Java SDK ADL Python* Your application Azure and ADL Store REST APIs
  • 21.
    Capability ADLS AzureBlob Purpose Optimized for Analytics Analysis using Batch, Interactive, Streaming, ML General purpose storage scenarios App backend, backup data, media storage for streaming, log files, IoT telemetry, Big Data analytics Geographic Availability East US 2, Central US, North Europe All Data Centers HDFS Yes (Web HDFS) No Scale No Limit on Bandwidth or Storage size Limits -5PB Storage (announced) -50GBps Bandwidth Authentication & Authorization Azure Active Directory POSIX ACLs on Files and Folders Access keys & SAS tokens Structure Accounts / Folders / Files (with Hierarchical folders) Accounts / Containers / Blobs (flat namespace) Encryption Yes Yes Geo- Replication No Yes [LRS, GRS, RA-GRS] Cost [1PB] $40K Coming soon HOT $20K COOL $16K
  • 22.
    LOB Applications SocialDevices Clickstream Sensors Video Web Relational A highly scalable,distributed, parallel file system in the cloud specifically designed to work with a variety of big data analytics workloads Azure Data Lake Store Batch Map Reduce Script Pig SQL Hive NoSQL HBase In-Memory Spark Predictive R Server Batch U-SQL HDInsight ADL Analytics
  • 23.
    ANALYTICAL DASHBOARDS AZURE SQLDATA WAREHOUSE DATA FACTORY DATA FACTORY AZURE DATA LAKE STORE Scenario 6
  • 24.
    Azure Batch Enable applicationsand algorithms to easily and efficiently run in parallel at scale Rendering Media transcoding & pre-/post- processing Test execution Monte Carlo simulations Genomics Deep Learning OCR Data ingestion, processing, ETL R at scale Compiled MATLAB Engineering simulations Image analysis & processing
  • 25.
    INPUT OUTPUT Azure Batch Concepts Applications/ Algorithms Queue Pool of VMs Jobs & Tasks
  • 26.
    Azure Batch RenderingGA Queue Upload assets Submit job Return outputs Pay-per-minute licensing Windows and Linux VMs Autodesk Maya Plug-in Batch Labs x-plat client Azure CLI / PowerShell APIs Monitor job
  • 27.
    ANALYTICAL DASHBOARDS AZURE SQLDATA WAREHOUSE DATA FACTORY DATA FACTORY AZURE DATA LAKE STORE AZURE DATA LAKE ANALYTICS Scenario 7
  • 28.
    Data Lake AnalyticsWorkloads With BATCH workload, Data Lake Analytics is ideal for • The transformation and preparation of data for use in other systems • Analytics on VERY LARGE amounts of data • Massively Parallel programs written in .NET, Python and R, scaled out with U- SQL • Performing Cognition at Scale on large collections
  • 29.
    Data Lake Analytics DataLake Store An illustration U-SQL Query . . .ComputeComputeCompute Unstructured Data . . .
  • 30.
    U-SQL Query Query Azure Storage Blobs Azure SQL inVMs Azure SQL DB Azure Data Lake Analytics Azure SQL Data Warehouse Azure Data Lake Storage Easily query data in multiple Azure data stores without moving it to a single store
  • 31.
    Embedded Artificial Intelligence HostDeep Neural Networks (DNNs) 6 Built-in Cognitive Functions – Face API – Image Tagging – Emotion analysis – OCR – Text Key Phrase Extraction – Text Sentiment Analysis
  • 32.
    Extract Process Output User CodeUser Code UserCode User Code Declarative Framework User Extensions U-SQL Example Extract User Code User Code
  • 33.
  • 34.
    ANALYTICAL DASHBOARDS AZURE SQLDATA WAREHOUSE DATA FACTORY DATA FACTORY AZURE DATA LAKE STORE AZURE DATA LAKE ANALYTICS Scenario 8 AZURE DATA LAKE ANALYTICS Cleansing Analysis
  • 35.
    Orchestration Key ManagementPrivateConnections Monitoring AZURE EXPRESSROUTE AZURE DATA FACTORY AZURE KEY VAULT OPERATIONS MANAGEMENT SUITE ANALYTICAL DASHBOARDS AZURE SQL DATA WAREHOUSE DATA FACTORY DATA FACTORY AZURE DATA LAKE STORE AZURE DATA LAKE ANALYTICS WEB & MOBILE APPS AZURE STREAM ANALYTICS Scenario 9 Azure Time Series Insights
  • 36.
    Store and manageterabytes of time-series data Explore and visualize billions of events simultaneously Conduct root-cause analysis, and to compare multiple sites and assets
  • 37.
    Illustrating an application StreamAnalytics Time Window SELECT … Written in Stream Analytics Query Language, a subset of T-SQL Stream A standing query
  • 38.
    Orchestration Key ManagementPrivateConnections Monitoring AZURE EXPRESSROUTE AZURE DATA FACTORY AZURE KEY VAULT OPERATIONS MANAGEMENT SUITE ANALYTICAL DASHBOARDS AZURE SQL DATA WAREHOUSE DATA FACTORY DATA FACTORY AZURE DATA LAKE STORE AZURE DATA LAKE ANALYTICS COSMOS DB WEB & MOBILE APPS AZURE STREAM ANALYTICS Scenario 10
  • 40.
    Orchestration Key ManagementPrivateConnections Monitoring AZURE EXPRESSROUTE AZURE DATA FACTORY AZURE KEY VAULT OPERATIONS MANAGEMENT SUITE ANALYTICAL DASHBOARDS AZURE SQL DATA WAREHOUSE DATA FACTORY DATA FACTORY AZURE MACHINE LEARNING & MACHINE LEARNING SERVER AZURE DATA LAKE STORE AZURE DATA LAKE ANALYTICS COSMOS DB WEB & MOBILE APPS AZURE STREAM ANALYTICS Scenario 11
  • 41.
  • 42.
    $2,600.45 $2,294.58 $1,003.30 $8,488.32 Name Amount Fraudulent Smith Janet John Adams No Yes Yes No Where Issued Where Used Ageof Cardholder $200.12 $3,250.11 $8,156.20 $7,475.11 Pali Jones Hanford Marx USA USA USA FRA AUS USA USA UK 22 29 25 64 58 43 27 32 No No Yes No USA RUS RUS USA JAP RUS RUS GER $540.00 $7,475.11 Norse Edson USA USA 27 20 No Yes RUS RUS What’s the pattern for fraudulent transactions?
  • 43.
  • 44.
    MICROSOFTAZURE Model Call Center Staff CallCenter ApplicationBlobsDetailed Call Data ONPREMISES CRM Data Data for ML Aggregated Call Data ADLA Azure ML Azure Data Factory Need a real-time prediction of each caller’s propensity to churn Model is rebuil and redeployed regularly
  • 45.
    Orchestration Key ManagementPrivateConnections Monitoring AZURE EXPRESSROUTE AZURE DATA FACTORY AZURE KEY VAULT OPERATIONS MANAGEMENT SUITE AZURE SQL DATA WAREHOUSE DATA FACTORY DATA FACTORY AZURE MACHINE LEARNING & MACHINE LEARNING SERVER AZURE DATA LAKE STORE AZURE DATA LAKE ANALYTICS COSMOS DB WEB & MOBILE APPS AZURE STREAM ANALYTICS Scenario 12 Power BI Power BI Embedded
  • 46.
  • 47.
    Microsoft Azure subscription Embed End users Workspace Workspacecollection 1,N Developer Name Admin Users Endpoints Keys Gateways Credentials Geo Location Tags Name Reports Datasets Tags Your app Azure SQL Data Warehouse Azure SQL Database 1,N 1,N
  • 48.
    Power BI Users Permissions Auth. providers APIkeys Token + Claim: Can view Report 1 + Expiration: 5 minutes User requests to view Report 1 Validate token API keys Report 2 Workspace Report 1 Application Provide seamless authentication experiences
  • 49.
    Provide seamless authenticationexperiences Power BI Users Permissions Auth. providers API keys API keys Report 2 Workspace Report 1Report 1 Application
  • 50.
  • 51.
    Users Application Permissions Auth. providers Power BI APIkeys Report 2 Workspace Report 1 Token + Claim: Can view Report 1 + Expiration: 5 minutes + username: “user1” + roles: “sales” API keys Copy API keys to your application Sign token Provide seamless authentication experiences
  • 52.
    Power BI RESTAPI Authentication flow: Web application
  • 54.
    FAQ • What isa report session and how is it billed? • A session is a set of interactions between an end user and a Power BI Embedded report. Each time a Power BI Embedded report is displayed to a user, a session is initiated and the subscription holder will be charged for a session. Sessions are billed at a flat rate, independent of the number of visual elements in a report or how frequently the report content is refreshed. A session ends when either the user closes the report, or the session times out after one hour. • Do you offer any tools or guidance to help me estimate how many renders/session I should expect? How will I know how many renders have been completed? • The Azure Portal will provide billing details on how many renders / report sessions have been performed against your subscription. • Do I need a Power BI subscription in order to develop applications with Power BI Embedded? How do I get started? • As the application developer, you do not need to have a Power BI subscription in order to create the reports and visualizations you wish to use in your application. You will need a Microsoft Azure subscription and the free Power BI Desktop application.
  • 55.
    Orchestration Key ManagementPrivateConnections Monitoring AZURE EXPRESSROUTE AZURE DATA FACTORY AZURE KEY VAULT OPERATIONS MANAGEMENT SUITE AZURE SQL DATA WAREHOUSE DATA FACTORY DATA FACTORY AZURE MACHINE LEARNING & MACHINE LEARNING SERVER AZURE DATA LAKE STORE AZURE DATA LAKE ANALYTICS COSMOS DB WEB & MOBILE APPS AZURE STREAM ANALYTICS Scenario 13 Power BI COGNITIVE SERVICESBOT SERVICE Logic App
  • 56.
    Orchestration Key ManagementPrivateConnections Monitoring AZURE EXPRESSROUTE AZURE DATA FACTORY AZURE KEY VAULT OPERATIONS MANAGEMENT SUITE AZURE SQL DATA WAREHOUSE DATA FACTORY DATA FACTORY AZURE MACHINE LEARNING & MACHINE LEARNING SERVER AZURE DATA LAKE STORE WEB & MOBILE APPS Scenario 14 ANALYTICAL DASHBOARDS AZURE HDINSIGHT (Hadoop/Hive) AZURE HDINSIGHT (Hadoop/Storm) AZURE HDINSIGHT (Hadoop/Kafka) Kafka AZURE HDINSIGHT (Hadoop/HBase) COGNITIVE SERVICESBOT SERVICE Logic App
  • 57.
    Clusters Microsoft Azure Datacenter HDInsightCluster VMVMVMVMVMVMVMVMVMVMVMVM Created through the Azure portal
  • 59.
    Microsoft Hadoop Stack AzureHDInsight Machine Learning Local (HDFS) or Cloud (Azure Blob/Azure Data Lake Store)
  • 60.
  • 61.
    Multi Region Availability Availablein >25 regions world-wide Launched most recently in US West 2, and UK regions Available in China, Europe and US Government clouds
  • 62.
    IaaS Clusters ManagedClusters Big Data as-a-service Best for… Workloads Administrative Developer Control & configuration Service Level Agreement TCO CONTROL EASE OF USE AND ADOPTION
  • 63.
    Orchestration Key ManagementPrivateConnections Monitoring AZURE EXPRESSROUTE AZURE DATA FACTORY AZURE KEY VAULT OPERATIONS MANAGEMENT SUITE AZURE SQL DATA WAREHOUSE DATA FACTORY DATA FACTORY AZURE DATA LAKE STORE Scenario 14 ANALYTICAL DASHBOARDS AZURE HDINSIGHT (Hadoop/Hive) AZURE HDINSIGHT (Hadoop/Storm) AZURE HDINSIGHT (Hadoop/Kafka) Kafka AZURE HDINSIGHT (Hadoop/R) Jupyter Data Science Notebooks AZURE HDINSIGHT (Hadoop/Spark)
  • 64.
    Community Algorithms Spark ML(PySpark, SparkR) Caffe on Spark BigDL on HDInsight SparklyR XGBoost Supported by community ISV Applications H2O Dataiku Supported by ISV
  • 66.
    Orchestration Key ManagementPrivateConnections Monitoring AZURE EXPRESSROUTE AZURE DATA FACTORY AZURE KEY VAULT OPERATIONS MANAGEMENT SUITE AZURE SQL DATA WAREHOUSE DATA FACTORY DATA FACTORY AZURE DATA LAKE STORE Scenario 15 ANALYTICAL DASHBOARDS AZURE HDINSIGHT (Hadoop/Hive) AZURE HDINSIGHT (Hadoop/Storm) AZURE HDINSIGHT (Hadoop/Kafka) Kafka AZURE HDINSIGHT (Hadoop/R) Jupyter Data Science Notebooks AZURE HDINSIGHT (Hadoop/Spark) DATA CATALOG
  • 67.
    Analyze Enabling the EntireEnterprise Data Ecosystem • Search • Browse • Filter Discover • Metadata • Experts • Context Understand • Your data • Your tools • Your way Consume • Tag • Document • Publish Contribute
  • 71.
    Source: Mastering AzureAnalytics, 1st Edition - Zoiner Tejada, O'Reilly Media, Inc., April 2017
  • 72.
    Thank You Mohamed Tawfik CloudSolutions Architect Azure CoE - EMEA

Editor's Notes

  • #3 Add key for the coluors
  • #4 Add key for the coluors
  • #5 Add key for the coluors
  • #7 Notes: Web jobs can be used for streaming processing when set to continuous, functions can only be triggered or scheduled so they are not suitable. In some cases logic apps might fit for orchestrating specific tasks Azure Data Factor and Oozie are the main orchestrators offered in Azure Apache Oozie is a Java web application that does workflow coordination for Hadoop jobs. In Oozie, a workflow is defined as directed acyclic graphs (DAGs) of actions. It supports different types of Hadoop jobs, such as MapReduce, Streaming, Pig, Hive, Sqoop, and more. Not only these, but also system-specific jobs, such as shell scripts and Java programs. Apache Sqoop is a tool to transfer bulk data to and from Hadoop and relational databases as efficiently as possible. It is used to import data from relational database management systems (RDBMS)— such as Oracle, MySQL, SQL Server, or any other structured relational database—and into the HDFS. It then does processing and/or transformation on the data using Hive or MapReduce, and then exports the data back to the RDBMS.
  • #9 Add key for the coluors
  • #10 Add key for the coluors
  • #12 Add key for the coluors
  • #13 Add key for the coluors
  • #14 Add key for the coluors
  • #16 Add key for the coluors
  • #17 Add key for the coluors
  • #18 Add key for the coluors
  • #19 Add key for the coluors
  • #20 Add key for the coluors
  • #21 Add key for the coluors
  • #22 Add key for the coluors
  • #23 Add key for the coluors
  • #24 Add key for the coluors
  • #25 Add key for the coluors
  • #26 Add key for the coluors
  • #27 Add key for the coluors
  • #28 Add key for the coluors
  • #29 Add key for the coluors
  • #30 Add key for the coluors
  • #31 Add key for the coluors
  • #32 Add key for the coluors
  • #33 Add key for the coluors
  • #34 Add key for the coluors
  • #35 Add key for the coluors
  • #36 Add key for the coluors
  • #37 Add key for the coluors
  • #38 Add key for the coluors
  • #39 Add key for the coluors
  • #40 Add key for the coluors
  • #41 Add key for the coluors
  • #42 Add key for the coluors
  • #43 Add key for the coluors
  • #44 Add key for the coluors
  • #45 Add key for the coluors
  • #46 Add key for the coluors
  • #47 Add key for the coluors
  • #48 Add key for the coluors
  • #49 Add key for the coluors
  • #50 Add key for the coluors
  • #51 Add key for the coluors
  • #52 Add key for the coluors
  • #53 Add key for the coluors
  • #54 Add key for the coluors
  • #55 Add key for the coluors
  • #56 Add key for the coluors
  • #58 Add key for the coluors
  • #59 Add key for the coluors
  • #60 Add key for the coluors
  • #61 Add key for the coluors
  • #62 Add key for the coluors
  • #63 Add key for the coluors
  • #64 Add key for the coluors
  • #65 Add key for the coluors
  • #66 Add key for the coluors
  • #67 Add key for the coluors
  • #68 Add key for the coluors
  • #69 Add key for the coluors
  • #70 Add key for the coluors
  • #71 Add key for the coluors
  • #72 Notes: Web jobs can be used for streaming processing when set to continuous, functions can only be triggered or scheduled so they are not suitable. In some cases logic apps might fit for orchestrating specific tasks Azure Data Factor and Oozie are the main orchestrators offered in Azure Apache Oozie is a Java web application that does workflow coordination for Hadoop jobs. In Oozie, a workflow is defined as directed acyclic graphs (DAGs) of actions. It supports different types of Hadoop jobs, such as MapReduce, Streaming, Pig, Hive, Sqoop, and more. Not only these, but also system-specific jobs, such as shell scripts and Java programs. Apache Sqoop is a tool to transfer bulk data to and from Hadoop and relational databases as efficiently as possible. It is used to import data from relational database management systems (RDBMS)— such as Oracle, MySQL, SQL Server, or any other structured relational database—and into the HDFS. It then does processing and/or transformation on the data using Hive or MapReduce, and then exports the data back to the RDBMS.