Date: 13th November 2018
Location: Self-Service Analytics Theatre
Time: 14:30 - 15:00
Speaker: Zaf Khan
Organisation: Arcadia Data
About: The use of data lakes continue to grow, and a recent survey by Eckerson Group shows that organizations are getting real value from their deployments. However, there’s still a lot of room for improvement when it comes to giving business users access to the wealth of potential insights in the data lake.
While the data management aspect has been fairly well understood over the years, the success of business intelligence (BI) and analytics on data lakes lags behind. In fact, organizations often struggle with data lakes because they are only accessible by highly-skilled data scientists and not by business users. But BI tools have been able to access data warehouses for years, so what gives?
In this talk, we’ll discuss:
• Why traditional BI tools are architected well for data warehouses, but not data lakes.
• Why every organization should have two BI standards: one for data warehouses and one for data lakes.
• Innovative capabilities provided by BI for data lakes
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
1. Arcadia Data. Proprietary and Confidential
A Tale of Two BI Standards:
One for Data Warehouses and One for Data Lakes
Zaf Khan
November 2018
2. Arcadia Data. Proprietary and Confidential
2
20+ years in Enterprise Integration & Analytics
§ 10+ years Support, Consulting, Training
§ 10+ years PreSales, Account Manager
§ Previous projects included
§ Tableau, Spotfire, Cognos, Business Objects, Platfora, Pentaho
2
Quick Background
3. Arcadia Data. Proprietary and Confidential
3
1. Minimize Data Movement
2. Minimize Copies of Data
3. Minimize the Number of Places to Secure Data
4. Leverage the Power of Parallel Processing
5. Visualize Structured and Unstructured Data
6. Visualize Data in Motion
7. Visualize Data from Multiple Data Sources
8. Provide a Self-Service Discovery Environment
9. Model Data Based on Usage
10. Productionize on the Same Platform as Your Discovery Environment
10 Big Data Considerations for Visual Analytics/BI Tool Selection
4. Arcadia Data. Proprietary and Confidential
4
Anyone Remember the 3 V’s?
Volume
Variety
Velocity
4
Why have Many Big Data/Data Lake Initiatives Failed?
5. Arcadia Data. Proprietary and Confidential
5
Companies Focused on the Data Deluge of the 3/8 V’s
Answer – Build a Data Lake!
5
6. Arcadia Data. Proprietary and Confidential
6
6
What Problem are Companies Faced With Today?
Uncovering Business Value from Their Data Lakes
7. Arcadia Data. Proprietary and Confidential
7
“Data” and “Platforms" Have Changed – Why Haven’t BI Tools?
From To
Data
Platforms
BI Tools
rows and columns and multi-structured
batch and interactive and real-time
small and large volumes
many sources
internal and external
tables and docs, search indexes, events
schema on write and schema on read
commodity hardware
ETL and ELT and ELDT
data warehouses and data lakes
rows and columns
batch
smaller data volumes
limited # sources
mainly internal
tables
schema on write
proprietary hardware
ETL
data warehouses
SQL queries
extracts
cubes
BI servers
small/med scale
Why haven’t BI
tools evolved?
8. Arcadia Data. Proprietary and Confidential
8
Would you use water skis to
ski down a mountain?
Why Not Use Any BI Tool? Architecture Built for a Purpose
Then why would you use a
data warehouse BI tool
on a data lake?
10. Arcadia Data. Proprietary and Confidential
10
Companies Are Now Choosing Two BI Standards for Their Enterprise
10
Data Warehouse Data Lake
BI Standard for
Data Warehouse
(RDBMS)
BI Standard for
Data Lake
(HDFS, Cloud Object Store)
11. Arcadia Data. Proprietary and Confidential
11
Data Warehouse BI Architecture
11
BI Server Analytic Process
Optimize Physical
Semantic Layer
Secure Data
Load Data
Big Data Requirements
Native Connection
Semi-Structured
Parallel
Real-time
Data Warehouse
(RDBMS)
12. Arcadia Data. Proprietary and Confidential
12
Data Lake BI Architecture
12
BI Server
Data Warehouse
(RDBMS)
Data Lake
(HDFS, Cloud Object Storage)
Arcadia Data was built
from inception to
run natively within data lakes
Analytic Process
Optimize Physical
Semantic Layer
Secure Data
Load Data
Big Data Requirements
Native Connection
Semi-Structured
Parallel
Real-time
13. Arcadia Data. Proprietary and Confidential
13
The Result: Faster BI Analytics and Higher User Concurrency
13
25 35
88 105
169
427404
644
1440
120
214
366
199
379.107
687
0
200
400
600
800
1000
1200
1400
1 2 5 10 15 30
Completion Time (seconds)
# of Concurrent Jobs
Query 1 Performance Testing - Heavy Query
Arcadia Hive Impala Spark
Customer Benchmark of a Legacy BI Tool Accelerated by Arcadia Data On a Data Lake
Arcadia Data Other SQL Engines
14. Arcadia Data. Proprietary and Confidential
14
Data Lake BI Architecture – More than Just Historical Analysis
14
Arc Viz
Streams/Topics
Real-Time Data
Data Warehouse
(RDBMS)
Data Lake
(HDFS, Cloud Object Storage)
Arcadia Data was built
from inception to
run natively within data lakes
15. Arcadia Data. Proprietary and Confidential
Data Drives Market Disruption
15
Arcadia Data Streaming Visualizations
Data Sources
Historical Visuals
Native Access for Streaming Analytics – Real-Time + Historical
Real-Time Visuals
Advanced Visualizations
and Semantic Layer
Data Node
KSQL Cluster
Streaming Data
Kafka Cluster
Source Topics
Data Node Data Node
Data Node Data Node
… …
………
……
IoT Dashboard
16. Arcadia Data. Proprietary and Confidential
16
Data Lake BI Architecture – More than Just Historical Analysis
16
Arc Viz
Data Warehouse
(RDBMS)
Data Lake
(HDFS, Cloud Object Storage)
Arcadia Data was built
from inception to
run natively within data lakesStreams/Topics
Real-Time Data
17. Arcadia Data. Proprietary and Confidential
17
BI for Data Lakes Must be Architected for Scale and Performance
Edge Node JDBC
BI Server
Data Warehouse BI Architecture
• BI Server can’t scale out
• Significant data movement, modeling, security management
Data Lake Cluster
“Big Data” BI Architecture
• Edge node BI server only scales via long planning
• Performance optimizations require heavy IT intervention
• Only passing SQL with no semantic information (e.g., filters)
Native BI within Data Lake Architecture
• Scales linearly with DataNodes while retaining agility
• Semantic model is “pushed down” and distributed
• Highly optimized “based on usage” physical model
• No data movement; single security model
DataNodes
Browser
DataNodes + Arcadia
Data Lake Cluster
Browser
Edge Node BI Server DataNodes
Data Lake Cluster
Browser
18. Arcadia Data. Proprietary and Confidential
18
Data Lake BI Architecture – Load, Secure, and Process Data in One Place!
18
Data
Warehouse
Data
Lake
Arcadia Data was built
from inception to
run natively within data lakes
19. Arcadia Data. Proprietary and Confidential
19
Arcadia Data: Foundational Building Blocks
19
Arc Engine
Powerful processing engine that runs
on the Hadoop data nodes that
provides the scalability, concurrency
and native security of Hadoop.
Arc Viz
Scalable browser based front
end for the reporting,
dashboards and visuals that runs
on the Hadoop data or edge
nodes.
20. Arcadia Data. Proprietary and Confidential
20
Delivering Enterprise Flexibility and Performance
20
Accelerate Data Lake
for Existing User Solutions
ARCENG
Data
Warehouse
Data Lake
JDBC /
ODBC
JDBC/O
DBC
ARCENG
Deliver Complete
Scalable BI Solution
Data
Warehouse
ARCVIZ
Data Lake
JDBC / ODBC Native
ARCENG ARCENG
Unified BI Solution for Existing and
Modern Data Platforms
Data
Warehouse
ARCVIZ
Data Platforms
JDBC /
ODBC
Native
21. Arcadia Data. Proprietary and Confidential
Time to Value and Production – Architecture and Analytic Process
Model Data
Land and
Secure Data
Semantic and
Visual/Analytic
Discovery
Production
RDBMS
DATA
WAREHOUSE
PLATFORM
Data Warehouse Load, Model and Go “Build it and they will Come”
It is also about the
Analytic Process Improvement
It is not Just about System Architecture
22. Arcadia Data. Proprietary and Confidential
Time to Value and Production – Architecture and Analytic Process
Model Data
Land and
Secure Data
Semantic &
Visual/
Analytic
Discovery
Production
Extract and Load
- ETL servers
- ELT In-database
Transform
- Put into Tables
- Star-Scheme or
denormalized
3NF
Discovery and
Reports
- Build Semantic
Layer
- Design Report
Layout
Productionize
- Optimize
Physical
Scheme
Weeks and Months in Most Companies Weeks
Often Discovery
Only Run Once
Optimize in
Database or
BI Tool or
Both?
Data Warehouse Load, Model and Go “Build it and they will Come”
23. Arcadia Data. Proprietary and Confidential
Time to Value and Production – Architecture and Analytic Process
Model Data
Land and
Secure Data
Semantic &
Visual/
Analytic
Discovery
Production
RDBMS
DATA
WAREHOUSE
PLATFORM
Extract and
Secure
Load and
Secure
Transform
Cubes or Aggregates
Transform
Star-Scheme or 3NF
Build Semantic Layer
Productionize
Optimize Physical
Productionize
Optimize Physical
Build Semantic Layer
Discovery and Reports
Data Warehouse (RDBMS)
Data Warehouse BI Server
Data Warehouse Load, Model and Go “Build it and they will Come”
24. Arcadia Data. Proprietary and Confidential
Time to Value and Production – Architecture and Analytic Process
Model Data
Land and
Secure Data
Semantic &
Visual /
Analytic
Discovery
Production
RDBMS
DATA
WAREHOUSE
PLATFORM
Extract and
Secure
Load and
Secure
Transform
Cubes or Aggregates
Transform
Star-Scheme or 3NF
Build Semantic Layer
Productionize
Optimize Physical
Productionize
Optimize Physical
Build Semantic Layer
Discovery and Reports
Data Warehouse (RDBMS)
Data Warehouse BI Server
Data Warehouse Load, Model and Go “Build it and they will Come”
Time to Value Delayed
Weeks and Months
25. Arcadia Data. Proprietary and Confidential
Time to Value and Production – Architecture and Analytic Process
Model Data
Land and
Secure Data
Semantic &
Visual /
Analytic
Discovery
Production
RDBMS
DATA
WAREHOUSE
PLATFORM
Extract and
Secure
Transform
Cubes or Aggregates
Productionize
Optimize Physical
Build Semantic Layer
Discovery and Reports
Data Warehouse BI Server
Data Lake Load, Model and Go “Build it and they will Come”
26. Arcadia Data. Proprietary and Confidential
Time to Value and Production – Architecture and Analytic Process
Model Data
Land and
Secure Data
Semantic &
Analytic/
Visual
Discovery
Production
RDBMS
DATA
WAREHOUSE
PLATFORM
Extract and
Secure
Load and
Secure
Transform
Cubes or Aggregates
Transform
Star-Scheme or 3NF
Build Semantic Layer
Productionize
Optimize Physical
Productionize
Optimize Physical
Build Semantic Layer
Discovery and Reports
Data Lake (Hadoop)
Data Warehouse BI Server
Data Lake Load, Model and Go “Build it and they will Come”
27. Arcadia Data. Proprietary and Confidential
Time to Value and Production – Architecture and Analytic Process
Model Data
Land and
Secure Data
Semantic &
Visual/
Analytic
Discovery
Production
RDBMS
DATA
WAREHOUSE
PLATFORM
Extract and
Secure
Load and
Secure
Transform
Cubes or Aggregates
Transform
Star-Scheme or 3NF
Build Semantic Layer
Productionize
Optimize Physical
Productionize
Optimize Physical
Build Semantic Layer
Discovery and Reports
Data Lake (Hadoop)
Data Warehouse BI Server
Data Lake Load, Model and Go “Build it and they will Come”
Data Warehouse BI Tools Treat
Hadoop/Cloud Just Like any
Other Database
Time to Value Delayed
Weeks and Months
28. Arcadia Data. Proprietary and Confidential
Time to Value and Production – Architecture and Analytic Process
Model Data
Land and
Secure Data
Semantic
&Visual/
Analytic
Discovery
Production
RDBMS
DATA
WAREHOUSE
PLATFORM
Load and
Secure
Transform
Star-Scheme or 3NF
Build Semantic Layer Productionize
Optimize Physical
Data Lake (Hadoop)
Data Lake Load and Go “Discover to Production”
BI Native for Data Lakes
Data Lake Native BI
Data and Processing In One Place
29. Arcadia Data. Proprietary and Confidential
Time to Value and Production – Architecture and Analytic Process
Model Data
Land and
Secure Data
Semantic
&Visual/
Analytic
Discovery
Production
RDBMS
DATA
WAREHOUSE
PLATFORM
Load and
Secure
Transform
Star-Scheme or 3NF
Build Semantic Layer Productionize
Optimize Physical
Data Lake (Hadoop)
Data Lake Load and Go “Discover to Production”
BI Native for Data Lakes
ELDT
30. Arcadia Data. Proprietary and Confidential
Time to Value and Production – Architecture and Analytic Process
Model Data
Land and
Secure Data
Semantic &
Visual/
Analytic
Discovery
Production
RDBMS
DATA
WAREHOUSE
PLATFORM
Load and
Secure
Transform
Star-Scheme or 3NF
Build Semantic Layer Productionize
Optimize Physical
Data Lake (Hadoop)
Data Lake Load and Go “Discover to Production”
Extract Load “Discover” Transform
Model Based on Usage
BI Native for Data Lakes
31. Arcadia Data. Proprietary and Confidential
Time to Value and Production – Architecture and Analytic Process
Land and
Secure Data
RDBMS
DATA
WAREHOUSE
PLATFORM
Load and
Secure
Semantic &
Visual/
Analytic
Discovery
Build Semantic Layer
Model Data
Transform
Star-Scheme or 3NF
Production
Productionize
Optimize Physical
Data Lake (Hadoop)
Data Lake Load and Go “Discover to Production”
From Discovery to Production
Based on Usage
Time to Value
In Days
BI Native for Data Lakes
32. Arcadia Data. Proprietary and Confidential
32
Time to Value and Production – Architecture and Analytic Process
Land and
Secure Data
RDBMS
DATA
WAREHOUSE
PLATFORM
Load and
Secure
Semantic &
Visual/
Analytic
Discovery
Build Semantic Layer
Model Data
Transform
Star-Scheme or 3NF
Production
Productionize
Optimize Physical
Data Lake (Hadoop)
Data Lake Load and Go “Discover to Production”
BI Native for Data Lakes
£100,000 in Business Value in 30 Days
or We Pick Up and Go Home
Time to Value
In Days
33. Arcadia Data. Proprietary and Confidential
33
§ Intuitive and Visual UI that Anyone
Can Use
§ Accessed via web-browser
§ Easy to compose visuals, dashboards and
apps via drag and drop
§ Get recommendations via machine-assisted
insights
§ Benefits
§ Unlocks big data analytics for business users
and analysts
§ Promotes agility and reduces time to insight
§ Enables business self-sufficiency and relieves
burden on IT
Self-Service Front End – No Coding Needed!
35. Arcadia Data. Proprietary and Confidential
35
Business Analysts Can Enrich Data with Their Own Table Joins
36. Arcadia Data. Proprietary and Confidential
36
Instant Visuals – AI-Based Visualization Recommendations
Pick the Visual of your Choice, or …
Visualization Builder Recommended Visualizations
shows which visuals best represent
your data.
37. Arcadia Data. Proprietary and Confidential
37
Arcadia Enterprise Handles the Complexity for You
No ETL Needed to Flatten Data
Supports Modern ARRAY, STRUCT, MAP
Complex Types and Nested Schemas
SELECT c.name, sum(i.amount)
FROM customers c, c.orders.items i
GROUP BY 1
Simple Drag and Drop Experience
Translates Complex Structure into Intuitive
Field Browser
No Flattening at Query Time
Generates Native SQL for Complex Types
Understands Complex Structures Easy Self-Service UI Powerful Native SQL
39. Arcadia Data. Proprietary and Confidential
39
Cloudera Spot Cybersecurity
39
Net flow dat
a over time
Machine
learning
output
Network graph analysis
40. Arcadia Data. Proprietary and Confidential
40
BI for Data Lakes Must be Architected for Scale and Performance
Edge Node JDBC
BI Server
Data Warehouse BI Architecture
• BI Server can’t scale out
• Significant data movement, modeling, security management
Data Lake Cluster
“Big Data” BI Architecture
• Edge node BI server only scales via long planning
• Performance optimizations require heavy IT intervention
• Only passing SQL with no semantic information (e.g., filters)
Native BI within Data Lake Architecture
• Scales linearly with DataNodes while retaining agility
• Semantic model is “pushed down” and distributed
• Highly optimized “based on usage” physical model
• No data movement; single security model
DataNodes
Browser
DataNodes + Arcadia
Data Lake Cluster
Browser
Edge Node BI Server DataNodes
Data Lake Cluster
Browser
41. Arcadia Data. Proprietary and Confidential
41
1. Minimize Data Movement
2. Minimize Copies of Data
3. Minimize the Number of Places to Secure Data
4. Leverage the Power of Parallel Processing
5. Visualize Structured and Unstructured Data
6. Visualize Data in Motion
7. Visualize Data from Multiple Data Sources
8. Provide a Self-Service Discovery Environment
9. Model Data Based on Usage
10. Productionize on the Same Platform as Your Discovery Environment
10 Big Data Considerations for Visual Analytics/BI Tool Selection
42. Arcadia Data. Proprietary and Confidential
42
Arcadia Data
42
The Only Visual Analytics and BI
Tool Built from Inception
to Run Natively on Hadoop