The document discusses building an end-to-end analytic solution in the cloud using Microsoft Azure tools, including ingesting data from various sources into Azure Data Factory, storing it in Azure Data Lake, transforming the data using U-SQL scripts in Azure Data Lake Analytics, developing predictive models with Azure Machine Learning Studio, and visualizing insights with Power BI. It provides examples of how each tool in the analytic lifecycle can be leveraged as part of an overall cloud-based analytics solution handling large volumes of data.
1. Analytics in the Cloud
analytic-lifecycle with tools based entirely in the cloud
Tailwindtech.com
952-544-2100
2. Analytics in the Cloud
Join as we walk you through the analytic-lifecycle with tools based entirely in the cloud. By
leveraging Microsoft’s suite of managed services in Azure, we will develop an end-to-end
analytic solution. We will begin by discussing and defining the business issue we would like to
solve. Moving forward we will apply well-defined data engineering practices for the intake of
structured data and non-structured data through Azure Data Factory. Data transformations will
be done through U-SQL scripts in Azure Data Lake Analytics for data cleansing, and data
integration. Once we have gathered and explored the data, we will dive into the usage of
Azure Machine Learning (ML). We will develop Azure ML models by leveraging Azure Machine
Learning Studio, which is another managed service by Microsoft to create, test, and deploy our
predictive analytics. Using Power BI we will then consume the predictive model. We finish
with a complete analytic-lifecycle where we present the final visualization for insight and
action.
3. Analytics in the Cloud
-The Data Flow-
U/SQL
R+
Python
.Net
INGEST STORE PREPARE ANALYZE INSIGHT
While you can encapsulate an entire analytic-lifecycle inside a single tool like Power BI. I wanted
to show an example that would leverage multiple tools, and scale into the hundreds of
terabytes, and eventually into petabytes and beyond.
4. Azure Data Factory (INGEST)
“…[A] data movement service in the
cloud, to ingest data from multiple on-
premises and cloud sources.”1
Supported
Sources
Supported
Sinks
5. Azure Data Factory (Ingest)
XML
LAKE
DB
DB
Example of how you can create activities within pipelines, and stack multiple
pipelines together.
6. Azure Data Lake (STORE)
Purpose
• Optimized
storage for
big data
analytic
workloads
Use Cases
• Batch
• Interactive
• Streaming
Analytics
• Machine
Learning
Scenarios
• Logs
• IoT
• Click Streams
• Large
Datasets
• Unstructured
data
7. Data Lake Store (Storage)
Database
Table 1
Table 2
XML
{1}
{2}
{3}
• Redshift
• DB2
• MySQL
• Oracle
• PostgreSQL
• SAP BW
• SAP HANA
• SQL Server
• Sybase
• Teradata
• NoSQL
• Cassandra
• MongoDB
• File
• S3
• File System
• FTP
• HDFS
• SFTP
• Generic HTTP
• Odata
• ODBC
• Salesforce
8. Azure Data Lake Analytics (PREPARE)
U/SQL
R+
Python
.Net
“… [A]n on-demand analytics job service
to simplify big data analytics.”2
Managing Data Lake Analytics
• Azure Portal
• Azure CLI
• PowerShell
• .NET SDK
• Python SDK
• Java SDK
• Node.js
• Visual Studio
SDKS
D
K
SDK
12. Power BI (INSIGHT)
Stacked Bar Chart
Waterfall Chart
Scatter Chart
Pie ChartTreemap
Map
R ScriptVisual
ARC GIS Map
CustomVisuals
13. Power BI - Build
SQL Data
Warehouse
>_
.R Gateway
Service
CallWeb
Service 2Write
Predictions
3
Refresh
6
An example of using
Azure Machine
Learning with Power BI
for visualization.
15. Analytic Life-cycle
Analytic Life-cycle
• Identify Business Problem
• Define the Data
• Explore &Transform the Data
• Apply Predictive Algorithms
• Provide Insight & Action
16. Analytic Life-cycle Components
U/SQL
R+
Python
.Net
INGEST STORE PREPARE ANALYZE INSIGHT
Data Factory Data Lake Data Lake
Analytics
Machine
Learning
Power BI
Define the Data
Explore &
Transform
the Data
Apply
Predictive
Algorithms
Provide Insight
& Action