Govern This! Data Discovery and the application of data governance with new stack technologies

1
GOVERN THIS!
Data Discovery & the Application of Data Governance
Cloudera and Tableau Software Online Webinar
May 1, 2014
Paul Lilford, Tableau Software
Marc Lobree, Tableau Software
Arlene Boyd, Cloudera
Mark Donsky, Cloudera

2
Agenda
©2014 Cloudera and Tableau Software . All rights reserved.
• Data Governance Requires a New Approach
• From Apache Hadoop to an Enterprise Data Hub
• Enterprise-Grade Governance with Cloudera Navigator
• Data Discovery and the Application of Data Governance
• Live Demo – Tableau Data Discovery
• Live Demo – Cloudera Navigator
• Q&A

3
Polling Question
3
How do you view existing governance processes?
1. Completely appropriate
2. Effective
3. Ineffective but needed
4. Obstructive

4
Polling Question
4
Are you in a line of business or IT person?
1. Business user
2. IT admin

5
Hadoop and Cloudera’s EDH:
A New Approach to Data

6 ©2014Cloudera, Inc. All rights reserved.
Expanding Data Requires A New Approach
6
Then
Bring Data to Compute
Now
Bring Compute to Data
Data
Information-centric
businesses use all Data:
Multi-structured,
Internal & external data
of all types
Comput
e
Comput
e
Comput
e
Process-centric
businesses use:
• Structured data mainly
• Internal data only
• “Important” data only
Comput
e
Comput
e
Comput
e
Dat
a
Data
Data
Data

7
From Apache Hadoop to an enterprise data
hub
7
Open Source
Scalable
Flexible
Cost-Effective
✔
Managed
Open
Architecture
Secure and
Governed
✖
✖
✖
BATCH
PROCESSING
STORAGE FOR ANY TYPE OF DATA
UNIFIED, ELASTIC, RESILIENT, SECURE
FILESYSTEM
MAPREDUCE
HDFS
Core Apache Hadoop is great, but…
1) Hard to use and manage.
2) Only supports batch processing.
3) Not comprehensively secure.

8
hub
8
Open Source
Scalable
Flexible
Cost-Effective
✔
Managed
Open
Architecture
Secure and
Governed
✔
BATCH
PROCESSING
SYSTEM
MANAGEMENT
FILESYSTEM
MAPREDUCE
HDFS
CLOUDERAMANAGER
✖
✖

9
hub
9
Open Source
Scalable
Flexible
Cost-Effective
✔
Managed
Open
Architecture
Secure and
Governed
✔
✔
BATCH
PROCESSING
ANALYTIC
SQL
SEARCH
ENGINE
MACHINE
LEARNING
STREAM
PROCESSING
3RD PARTY
APPS
WORKLOAD MANAGEMENT
SYSTEM
MANAGEMENT
FILESYSTEM ONLINE NOSQL
MAPREDUCE IMPALA SOLR SPARK SPARK STREAMING
YARN
HDFS HBASE
CLOUDERAMANAGER
✖

10
hub
10
Open Source
Scalable
Flexible
Cost-Effective
✔
Managed
Open
Architecture
Secure and
Governed
✔
✔
✔
BATCH
PROCESSING
ANALYTIC
SQL
SEARCH
ENGINE
MACHINE
LEARNING
STREAM
PROCESSING
3RD PARTY
APPS
WORKLOAD MANAGEMENT
DATA
MANAGEMENT
SYSTEM
MANAGEMENT
YARN
HDFS HBASE
CLOUDERANAVIGATORCLOUDERAMANAGER
SENTRY

11
hub
11
Open Source
Scalable
Flexible
Cost-Effective
✔
Managed
Open
Architecture
Secure and
Governed
✔
✔
✔
BATCH
PROCESSING
ANALYTIC
SQL
SEARCH
ENGINE
MACHINE
LEARNING
STREAM
PROCESSING
3RD PARTY
APPS
WORKLOAD MANAGEMENT
DATA
MANAGEMENT
SYSTEM
MANAGEMENT
CLOUDERA’S ENTERPRISE DATA HUB
YARN
HDFS HBASE
CLOUDERANAVIGATORCLOUDERAMANAGER
SENTRY

12
Partners
Proactive &
Predictive Support
Professional
Services
Training
Cloudera: Your Trusted Advisor for Big Data
12
Advance from Strategy to ROI with Best Practices and Peak Performance

13
Polling Question
13
Do you use Hadoop for data discovery?
1. Yes, currently use Hadoop
2. No, but planning to start
3. Currently have no plans

14
Hadoop/EDH Data Management:
Cloudera Navigator

1515
Problem Statement
Lots of data landing in the enterprise data hub
 Huge quantities with varying levels of sensitivity
 Many different sources – structured & unstructured
1
Many users working with the data in multiple ways
 Users: Compliance Officers, Analysts, Data Scientists, LOB
 Tools: BI tools, ETL tools, Hue, and more
2
Need to effectively control & consume data
 Get visibility & control over the environment
 Discover, explore and consume data
3

16
Data Management Challenges
•View, granting and revoke permissions across the Hadoop stack
•Identify access to a data asset around the time of security breach
•Generate alert when a restricted data asset is accessed
Auditing and Access
Management
•Given a data set, trace back to the original source
•Understand the downstream impact of purging/modifying a data setLineage
•Search through metadata to find data sets of interest
•Given a data set, view schema, metadata and policies
Metadata Tagging
and Discovery
16

17
Cloudera Navigator
17
Data Management Suite for Hadoop and Cloudera’s EDH
Audit & Access
Management
Ensuring appropriate permissions & auditing
on data access
Discovery & Exploration
Finding out what data is available and
what it looks like
Lineage
Tracing data back to its original source
Enterprise Metadata Repository
 Business metadata
 Lineage metadata
 Operational metadata
Audit &
Access Mgmt
Lineage Metadata
Discovery &
Exploration
HDFS HBASE HIVE
CLOUDERA NAVIGATOR
CDH
ETL
DW
DBMS
DM
…
Self
Tooling
REST
XMI

20
• Support the process of discovery, and new insights through
direct access to data by subject experts
• LOB Subject Experts (empowered for their subject area)
• Active IT support and engagement
• Security still fundamental and Data is still protected.
• Flexibility in governance, this is discovery not production.
• Better vetted requirements feed production and more highly
governed data types.
• Help organizations in the move to become data driven.
Data Discovery the new way!

21
But don’t take our word for it!
21
• The new normal:
• Business Driven
• Ease of use
• Self reliance
• Visual

22
For EveryoneEase of use leads to adoption across all departments and use cases

23
Polling Question
23
What percentage of time would you like to spend in
actual data discovery?
1. 0-10%
2. 10-20%
3. 20-30%
4. 30%+

24
•LIVE DEMO
Tableau
Data Discovery

25
•LIVE DEMO
Cloudera Navigator

26
Summary
26
• Business driven data discovery is fundamental for all
organizations
• Move from insight to action - become data driven
• Flexibility is key, yet so is scalability, integrated
management, security, and governance
• Prove it first – data discovery allows you to better vet
your solution before you invest
• The discovery layer brings IT and business users together
in a collaborative form

27
Questions?
27
Use the Chat tab on the left-side of
your screen to submit question
Watch this webinar on-demand:
www.cloudera.com
Contact Our Presenters:
plilford@tableausoftware.com
aboyd@cloudera.com
Or contact your account team
Thank you for attending!
Connector: Tableau on Cloudera
http://onlinehelp.tableausoftware.com/curre
nt/pro/online/en-
us/help.htm#examples_hadoop.html
Download Tableau
http://www.tableausoftware.com/
Download CDH – Free Open
Source
http://www.cloudera.com/downloads
Cloudera and Tableau:
http://www.cloudera.com/content/cloudera/e
n/solutions/partner/Tableau.html

Govern This! Data Discovery and the application of data governance with new stack technologies

More Related Content

What's hot

Similar to Govern This! Data Discovery and the application of data governance with new stack technologies

More from Cloudera, Inc.

Recently uploaded

Govern This! Data Discovery and the application of data governance with new stack technologies

Editor's Notes