-
1.
Power BI for Big Data and
the new look of Big Data
solutions
James Serra
Big Data Evangelist
Microsoft
JamesSerra3@gmail.com
-
2.
About Me
Microsoft, Big Data Evangelist
In IT for 30 years, worked on many BI and DW projects
Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM
architect, PDW/APS developer
Been perm employee, contractor, consultant, business owner
Presenter at PASS Business Analytics Conference, PASS Summit, Enterprise Data World conference
Certifications: MCSE: Data Platform, Business Intelligence; MS: Architecting Microsoft Azure
Solutions, Design and Implement Big Data Analytics Solutions, Design and Implement Cloud Data
Platform Solutions
Blog at JamesSerra.com
Former SQL Server MVP
Author of book “Reporting with Microsoft SQL Server 2012”
-
3.
Agenda
Azure Data Lake Store Gen2
Big data solution use cases
Power BI
Composite data models
Aggregation tables
Dataflows
XMLA Endpoints
RDL support
Application Lifecycle Management (ALM)
Incremental Refresh
Demo
Common architecture patterns
-
4.
Blob Storage Data Lake Store
Azure Data Lake Storage Gen2
Large partner ecosystem
Global scale – All 50 regions
Durability options
Tiered - Hot/Cool/Archive
Cost Efficient
Built for Hadoop
Hierarchical namespace
ACLs, AAD and RBAC
Performance tuned for big data
Very high scale capacity and throughput
Large partner ecosystem
Global scale – All 50 regions
Durability options
Tiered - Hot/Cool/Archive
Cost Efficient
Built for Hadoop
Hierarchical namespace
ACLs, AAD and RBAC
Performance tuned for big data
Very high scale capacity and throughput
-
5.
Hadoop on a cluster
of Azure virtual
machines
(IaaS)
Azure
HDInsight
(PaaS)
Azure
Data Lake Analytics
(SaaS)Azure
Databricks
(PaaS)
Higher level of
complexity, control, &
customization
Greater integration
with Apache
projects
Greater
ease of use
Less integration
with Apache
projects
Greater
administrative
effort
Less
administrative
effort
-
6.
Needs data governance so your data lake does not turn
into a data swamp!
-
7.
Objectives
Plan the structure based on optimal data retrieval
Avoid a chaotic, unorganized data swamp
Data Retention Policy
Temporary data
Permanent data
Applicable period (ex: project lifetime)
etc…
Business Impact / Criticality
High (HBI)
Medium (MBI)
Low (LBI)
etc…
Confidential Classification
Public information
Internal use only
Supplier/partner confidential
Personally identifiable information (PII)
Sensitive – financial
Sensitive – intellectual property
etc…
Probability of Data Access
Recent/current data
Historical data
etc…
Owner / Steward / SME
Subject Area
Security Boundaries
Department
Business unit
etc…
Time Partitioning
Year/Month/Day/Hour/Minute
Downstream App/Purpose
Common ways to organize the data:
-
8.
Microsoft Confidential
Import vs. DirectQuery
DirectQuery
Import
-
9.
Microsoft Confidential
Import vs. DirectQuery
DirectQuery
Import
-
10.
Sales
Date
Customer
Product
Employee
Geography
Reseller
Sales
Sales
Date
Customer
Product
Employee
Geography
Reseller
Sales
-
11.
SalesSales
Product
Customer
Geography
Date
Employee
Reseller
Sales
Date
Employee
Reseller
Sales
Customer
Geography
Product
-
12.
Sales AggSales
Product
Customer
Geography
Date
Employee
Reseller
Sales
Date
Employee
Reseller
Sales
Customer
Geography
Product
-
13.
Azure
Analysis Services
Power BIPower BI
Premium
Corporate BI Self-service BI
users
All BI users
-
14.
Sales
Product
Sales Agg
Customer
Geography
Date
Employee
Reseller
Sales
Date
Employee
Reseller
Sales
Customer
Geography
Product
-
15.
Sales
Product
Sales Agg
Customer
Geography
Date
Employee
Reseller
Sales
Date
Employee
Reseller
Sales
Customer
Geography
Product
SummarizeColumns(
Date[Year],
Geography[City],
"Sales", Sum(Sales[Amount])
)
-
16.
Sales
Product
Sales Agg
Customer
Geography
Date
Employee
Reseller
Sales
Date
Employee
Reseller
Sales
Customer
Geography
Product
SummarizeColumns(
Date[Year],
Customer[Name],
"Sales", Sum(Sales[Amount])
)
-
17.
Sales
Product
Sales Agg
Customer
Geography
Date
Employee
Reseller
Sales
Date
Employee
Reseller
Sales
Customer
Geography
Product “Many side” “One side”
Dual Dual
Import Import or Dual
DQ DQ or Dual
-
18.
Power BI introduces self-service data-prep capabilities
Self-service low code/no code Integral part of Power BI stack
Cloud and on-premises
connectors
Standard schema
(Common Data Model)
Data reuse In-lake transformationsDataflows
-
19.
Power BI introduces dataflows
BI models
Visualizations
Data prep
Data (Azure Data Lake)
-
20.
Data + AI professionals can use the full power of the
Azure Data Platform
Azure
Databricks
Azure MLAzure SQL
DW
Azure Data
Factory
Business analysts
Low/no code
Data scientists
Data engineers
Low to high code
CDM folder CDM folder CDM folder
-
21.
Dataflow editor
Create a new
dataflow using
Power BI dataflow
editor
-
22.
Dataflow editor
Create a new
dataflow using
Power BI dataflow
editor
-
23.
Ingest data
Ingest data using
on-prem and cloud
connectors
-
24.
Connect to Dynamics
via Common Data
Service for Apps
connector
Select Dynamics
Common Data
Model and custom
entities from CDS for
Apps data source to
ingest into Power BI
-
25.
PQ online
Use Power Query
Online to perform
transformations and
data cleansing
Map entities from
any data source (e.g.
SQL Azure) to the
Common Data
Model as part of PQ
transformations
-
26.
Perform mapping to
CDM
Choose a standard
entity that exists in
CDM to map your
data
-
27.
Perform mapping to
CDM
Choose a standard
entity that exists in
CDM to map your
data
-
28.
Incremental refresh
Define incremental
refresh based on
time columns
This dataflow
-
29.
Connect from Power
BI Desktop
Connect to Power BI
dataflows to
generate models and
reports using
dataflow data Dataflow
Power BI dataflow
-
30.
Business logic & metrics
Data modeling
Security
Azure Analysis Services
Server
Lifecycle management
In-memory
cache
-
31.
Business logic & metrics
Data modeling
Security
Lifecycle management
In-memory
cache
-
32.
Column(s)
Measure(s)
Table(s)
Model
Database
public void RefreshTable(...)
{
var server = new Server();
server.Connect(cnnString);
// Connect to the server
Database db = server.Databases[dbName];
// Connect to the database
Model = db.Model;
// Reprocess the table
model.Tables[tableName].RequestRefresh(RefreshType.Full);
model.SaveChanges(); // Commit the changes
}
-
33.
{
"refresh": {
"type": "full",
"objects": [
{
"database": "Sales Analysis",
"table": "Reseller Sales"
}
]
}
}
{
"createOrReplace": {
"object": {
"database": "AdventureWorks"
},
"database": {
"name": "AdventureWorks",
...
}
}
}
}
-
34.
I M P L E M E N T I N G
C O M M O N C U S T O M E R P A T T E R N S
-
35.
Advanced Analytics
Social
LOB
Graph
IoT
Image
CRM
INGEST STORE PREP MODEL & SERVE
Data orchestration
and monitoring
Big data store Transform & Clean Data warehouse
AI
BI + Reporting
Azure Data Factory
SSIS
Azure Data Lake
Storage Gen2
Azure Databricks
Azure Data Lake Analytics
Azure HDInsight
Azure SQL Data Warehouse
Azure Analysis Services
-
36.
INGEST STORE PREP & TRAIN MODEL & SERVE
C L O U D D A T A W A R E H O U S E
Azure Data Lake Store Gen2
Logs (unstructured)
Azure Data Factory
Microsoft Azure also supports other Big Data services like Azure HDInsight to allow customers to tailor the above architecture to meet their unique needs.
Media (unstructured)
Files (unstructured)
PolyBase
Business/custom apps
(structured)
Azure SQL Data
Warehouse
Azure Analysis
Services
Power BI
-
37.
INGEST STORE PREP & TRAIN MODEL & SERVE
M O D E R N D A T A W A R E H O U S E
Azure Data Lake Store Gen2
Logs (unstructured)
Azure Data Factory
Azure Databricks
Microsoft Azure also supports other Big Data services like Azure HDInsight to allow customers to tailor the above architecture to meet their unique needs.
Media (unstructured)
Files (unstructured)
PolyBase
Business/custom apps
(structured)
Azure SQL Data
Warehouse
Azure Analysis
Services
Power BI
-
38.
A D V A N C E D A N A L Y T I C S O N B I G D A T A
INGEST STORE PREP & TRAIN MODEL & SERVE
Cosmos DB
Business/custom apps
(structured)
Files (unstructured)
Media (unstructured)
Logs (unstructured)
Azure Data Lake Store Gen2Azure Data Factory Azure SQL Data
Warehouse
Azure Analysis
Services
Power BI
PolyBase
SparkR
Azure Databricks
Microsoft Azure also supports other Big Data services like Azure HDInsight, Azure Machine Learning to allow customers to tailor the above architecture to meet
their unique needs.
Real-time apps
-
39.
INGEST STORE PREP & TRAIN MODEL & SERVE
R E A L T I M E A N A L Y T I C S
Sensors and IoT
(unstructured)
Apache Kafka for
HDInsight
Cosmos DB
Files (unstructured)
Media (unstructured)
Logs (unstructured)
Azure Data Lake Store Gen2Azure Data Factory
Azure Databricks
Real-time apps
Business/custom apps
(structured)
Azure SQL Data
Warehouse
Azure Analysis
Services
Power BI
Microsoft Azure also supports other Big Data services like Azure IoT Hub, Azure Event Hubs, Azure Machine Learning to allow customers to
tailor the above architecture to meet their unique needs.
PolyBase
-
40.
INGEST STORE MODEL & SERVE
D A T A M A R T C O N S O L I D A T I O N
Azure Data Lake Store Gen2 Azure SQL
Data Warehouse
Azure Data Factory Azure Analysis
Services
Power BI
RDBMS data marts
Hadoop
Microsoft Azure also supports other Big Data services like Azure HDInsight to allow customers to tailor the architecture to meet their unique needs.
PolyBase
-
41.
INGEST STORE PREP & TRAIN MODEL & SERVE
H U B & S P O K E A R C H I T E C T U R E F O R B I
Azure SQL
Data Warehouse
PolyBase
Business/custom apps
(structured)
Power BI
Microsoft Azure supports other services like Azure HDInsight to allow customers a truly customized solution.
Multiple Azure Analysis
Services instances
SQL
Multiple Azure SQL
Database instances
Data Marts
Data Cubes
Azure Databricks
Logs (unstructured)
Media (unstructured)
Files (unstructured)
Azure Data Lake Store Gen2Azure Data Factory
-
42.
INGEST STORE PREP & TRAIN MODEL & SERVE
A U T O S C A L I N G D A T A W A R E H O U S E
Microsoft Azure supports other services like Azure HDInsight to allow customers a truly customized solution.
Azure Analysis
Services
Azure Functions
(Auto-scaling)
Business/custom apps
(structured)
Logs (unstructured)
Media (unstructured)
Files (unstructured)
Azure SQL
Data Warehouse
PolyBase
Power BIAzure Data Lake Store Gen2Azure Data Factory
Azure Databricks
-
43.
D A T A W A R E H O U S E M I G R A T I O N
INGEST STORE PREP & TRAIN MODEL & SERVE
Azure also supports other Big Data services like Azure HDInsight to allow customers to tailor the architecture to meet their unique needs.
Business/custom apps
(structured)
Azure SQL Data
Warehouse
Business/custom apps
Azure Data Lake Store Gen2
Logs (unstructured)
Azure Data Factory Azure Databricks
Media (unstructured)
Files (unstructured)
Azure Analysis
Services
Power BI
PolyBase
-
44.
Resources
Why use a data lake? http://bit.ly/1WDy848
Big Data Architectures http://bit.ly/1RBbAbS
The Modern Data Warehouse: http://bit.ly/1xuX4Py
Hadoop and Data Warehouses: http://bit.ly/1xuXfu9
-
45.
Q & A ?
James Serra, Big Data Evangelist
Email me at: JamesSerra3@gmail.com
Follow me at: @JamesSerra
Link to me at: www.linkedin.com/in/JamesSerra
Visit my blog at: JamesSerra.com (where this slide deck is posted under the “Presentations” tab)
Power BI for Big Data and the new look of Big Data solutions
New features in Power BI give it enterprise tools, but that does not mean it automatically creates an enterprise solution. In this talk we will cover these new features (composite models, aggregations tables, dataflow) as well as Azure Data Lake Store Gen2, and describe the use cases and products of an individual, departmental, and enterprise big data solution. We will also talk about why a data warehouse and cubes still should be part of an enterprise solution, and how a data lake should be organized.
Fluff, but point is I bring real work experience to the session
You can use enterprise tools, but that does not mean you are building an enterprise solution
Talking point: IT/PowerUser uses ADF/U-SQL. User could also bypass ADLS and go right to source if no cleaning needed
It takes the approach of ELT instead of ETL in that data is loaded into Azure Data Lake Store and then converted using the power of Azure Data Lake Analytics instead of it being transformed during the move from the source system to the data lake like you usually do when using SSIS
Sometimes has data marts (hub-and-spoke)
Crowed sourced career service, smart-phone app emits drivers location
https://www.sqlchick.com/entries/2017/12/30/zones-in-a-data-lake
https://www.sqlchick.com/entries/2016/7/31/data-lake-use-cases-and-planning
Question: Do you see many companies building data lakes?
Raw: Raw events are stored for historical reference. Also called staging layer or landing area
Cleansed: Raw events are transformed (cleaned and mastered) into directly consumable data sets. Aim is to uniform the way files are stored in terms of encoding, format, data types and content (i.e. strings). Also called conformed layer
Application: Business logic is applied to the cleansed data to produce data ready to be consumed by applications (i.e. DW application, advanced analysis process, etc). This is also called by a lot of other names: workspace, trusted, gold, secure, production ready, governed, presentation
Sandbox: Optional layer to be used to “play” in. Also called exploration layer or data science workspace
Drill to individual driver via Drillthrough
How to get answers to business questions about your data?
How to get answers to business questions about your data?
Question: Should SQL Database be considered in the Model & Serve blade, using it as a data mart?
Microsoft Azure supports other services like Azure HDInsight, Azure Data Lake, Azure IoT Hub, Azure Events Hub in various layers of the architecture above to allow customers a truly customized solution.