1. TABLEAU AND HADOOP
Tableau’s Place in a Big Data Architecture
DAMA, Tableau User Group Meeting
November 13, 2014
2. TABLEAU AND HADOOP
Agenda
BI/DW Workload Categories & Tableau
Three Integration Models
Capability Models
Architecture Patterns
Summary
Q & A
2
3. TABLEAU AND HADOOP
Workload Categories
3
Operational BI Data Exploration Data Science
• Operational processes
• Reports and dashboards
• Transactional sys integration
• Automatic distribution
• 100s – 1,000s of consumers
• Front-line staff
• Data analysts
• Business leaders
• Executives
• Production data prep
• High availability
• Report archiving
• Op sys response time & SLA
• Enterprise governance
• Enterprise security
• Self-service
• report access &
Interactivity
• Decision support processes
• Less strict definition
• Ad hoc reports and
dashboards
• Perf mgmt analysis by
• 100s of users
• data analysts
• business leaders
• Production & manual data
prep
• Enterprise or div governance
• Corporate security
• Self-Service
• Query
• Report/analysis
authoring
• Data design
• Metadata definition
• Complex data exploration
• Descriptive analytics
• Predictive statistical models
• Machine learning algorithms
• Large data volumes
• Wide data variety
• 10s of users
• Data scientists
• Technologists
• Departmental governance
• Raw data (Bus & IT)
• Derivative data (Bus & IT)
• Self-Service: Full
Tableau
4. TABLEAU AND HADOOP
Three Integration Models
Isolated Exploration Environment (aka Sandbox)
Snapshot of data cached on desktop or server
Frequency of data change is analyst dependent
Integrations occur through analyst, not enterprise, work
Live Interactive Query (aka BI/DW)
Constantly changing data stored in an enterprise data platform.
Frequency of data change is independent of analyst
Integrations occur primarily through enterprise work
Integrated Advanced Analytic Platform
Access to [custom] advanced analytic algorithms through Tableau
Application of analytic algorithms to new datasets
4
5. Analyst
Isolated Exploration Environment
TABLEAU AND HADOOP
Visual Exploration Prototype Analytical Applications
5
Metadata Tool
?
Analyst
Tableau SAS
Visual navigation
Measures
Hierarchies
Statistical
profile
Technical &
business
metadata
?
Tableau
Integrations
Data design
Visual organization
Granularity
Isolated Exploration Environment (aka Sandbox)
Snapshot of data cached on desktop or server
Frequency of data change is analyst dependent
Integrations occur through analyst, not enterprise, work
6. Live Interactive Query
TABLEAU AND HADOOP
Dashboarding Performance Management Analysis
6
Tableau
Visually engaging
KPIs
Defined analysis paths
Analyst
Define
Developer
Build
Business Leaders
& Staff
Use
Tableau
KPIs
Ad hoc analysis paths
Detail records
Analyst
Iterates
Generate
Analysis
Recommendation
Live Interactive Query (aka BI/DW)
Constantly changing data stored in an enterprise data platform.
Frequency of data change is independent of analyst
Integrations occur primarily through enterprise work
7. TABLEAU AND HADOOP
Integrated Advanced Analytic Platform
Enabling a “Clinical Trials” Model for Data Science
7
Phase I
Model Discovery
Phase II
Confirmation
Phase III
Pilot
Phase IV
Rollout
Data Science Team
(Centralized)
Data Analysts
(Decentralized)
Select
Business Leaders
Staff or Customers
All
Business Leaders
Staff or Customers
• Appropriate modeling
technique
• Rapid iterations
• Tool & algorithm
variety
• Confirm value
• Wider application
• Tool & data
conformity
• Demo business value
• Demo feasibility
• Realized value
• Refine through
application
Tableau
Integrated Advanced Analytic Platform
Access to [custom] advanced analytic algorithms through
Tableau
Application of analytic algorithms to new datasets
8. TABLEAU AND HADOOP
Analytic Capabilities & Hadoop
Architecture
Pattern
Capability Suitable for Hadoop / Considerations
Isolated Exploration
Environment
Visual
Exploration
Possibly
• Dataset has limited joins
• Dataset is large enough to warrant Hadoop as the
“cache”
Prototype
Analytical Apps
No
• Too many joins typically required for a prototype
• Prototypes can be confirmed on data subsets
Live Interactive
Query
Dashboards No
• Too many concurrent users
• Response time requirements are too stringent
Performance
Mgmt Analysis
Possibly
• Dataset has limited joins
• Dataset is large enough to warrant Hadoop as the
repository
Integrated
Advanced Analytic
Platform
“Clinical Trial”
approach
Yes.
• Tableau’s R integration
• Hadoop’s UDF, UDAF features
8
9. TABLEAU AND HADOOP
Architecture Pattern
Isolated Exploration Environment
9
Tableau
cache Desktop
Private
Data Data analyst
Business Leader
On demand
Enterprise
Data Asset
Extract Interactive
query
Isolated Exploration Environment
10. Tableau
Server
Enterprise
Data Asset
TABLEAU AND HADOOP
Architecture Pattern
Live Interactive Query
10
cache
cache
Data analyst
Developer
Cached Live Query
Live Query
Live Interactive Query
Tableau
Desktop
Tableau
Browser & Mobile
11. TABLEAU AND HADOOP
Architecture Pattern
Integrated Advanced Analytic Platform
11
Enterprise
Data Asset
Interactive Advanced
Analytic Platform
Analytic
Workbench
M
M
M
M
Live Query via
SQL extensions
& R integration
Live Query
python, R,
SAS, …
Data analyst
Data scientist
cache
M Analytic Model
SQL Extension Examples
MarkLogic SPARQL
SELECT name, affiliation
FROM emails
WHERE subject MATCH “answer”
HiveQL
SELECT my_function(…),
sum(freq)
FROM myDataTable;
References:
http://www.tableausoftware.com/about/blog/tableau-and-marklogic
http://developer.marklogic.com/blog/the-art-of-the-possible-marklogic-tableau-public
https://cwiki.apache.org/confluence/display/Hive/HivePlugins
Tableau
Server
12. TABLEAU AND HADOOP
Architecture Pattern
Integrated Advanced Analytic Platform
12
Interactive Advanced
Analytic Platform
References:
https://boraberan.wordpress.com/2013/12/24/sentiment-analysis-in-tableau-with-r/
http://cran.r-project.org/src/contrib/Archive/sentiment/
http://kb.tableausoftware.com/articles/knowledgebase/r-implementation-notes
http://www.tableausoftware.com/about/blog/2013/10/tableau-81-and-r-25327
Enterprise
Data Asset
Analytic
Workbench
M
M
M
M
Live Query via
SQL extensions
& R integration
Live Query
python, R,
SAS, …
Data analyst
Data scientist
cache
M Analytic Model
Tableau
Server
R integration example
Install R package called sentiment
Call classify_polarity R function using SCRIPT_STR function
13. Live Interactive Query
Interactive Advanced
Analytic Platform
Tableau
Desktop
Tableau
Browser & Mobile
TABLEAU AND HADOOP
Consolidated Architecture
13
Tableau
Desktop
cache
Private
Data
Data analyst
Business Leader
On demand
Enterprise
Data Asset
Extract Interactive
query
Isolated Exploration Environment
Live Query via
SQL extensions
& R integration
W W
Tableau
Server
Tableau
Server
cache
Data analyst
Developer
Cached Live Query
Live Query
Analytic
Workbench
M
M
M
M
Live Query
python, R,
SAS, …
Data analyst
Data scientist
cache
14. TABLEAU AND HADOOP
Summary, Q&A
– Thank you –
Contact Information
Craig Jordan
LinkedIn: www.linkedin.com/in/crjordan/
Email: Craig.Jordan@amfam.com
15
Editor's Notes
Operational BI: business intelligence and analytics related to the completion of operational processes. This includes reports and dashboards that are integrated directly into a transactional system as well as those standard reports and dashboards that are automatically distributed to 100s – 1,000s of consumers and external regulators. Work in this category requires an operational SLA (24/7); report archiving; and response time similar to operational application even with large numbers of concurrent users.
Data Analysis, Exploration & Visualization: business intelligence and analytics related to the completion of decision support processes. These deliverables are less strictly defined than those that are operational and include ad hoc reports as well as dashboards that enable business leaders to drill into the details behind specific business performance measurements and trends. The audience for these deliverables includes 10s of data analysts and 100s of business leaders.
Advanced Analytics & Data Science: business intelligence and analytics related to complex data exploration and integration as well as descriptive and predictive statistical models and machine learning algorithms. The deliverables related to these tasks are less strictly defined than those of the two other categories although they may be related to further understanding specifically defined KPIs. In addition, work in this category generally involves a larger volume and greater variety of data. Those responsible for this category of work include a small number of data scientists and a handful of specialized resources (in the business and I/S) who support them.