Cloudera SDX

CLOUDERA SDX
CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WORKLOADS
Wim Stoop | Senior Product Marketing Manager
Santosh Kumar | Senior Product Manager

2 © Cloudera, Inc. All rights reserved.
MULTI-
DISCIPLINARY
ANALYTICS

© Cloudera, Inc. All rights reserved.
WE ALL HAVE BAGGAGE

TRADITIONAL
APPLICATION SILOS
CONTEXT
STORAGE
APPLICATION
SECURITY
GOVERNANCE
LIFECYCLE
CONTROL
CATALOG
DATA
SCIENCE
FS
SQL
ANALYTIC
DATABASE
SECURITY
GOVERNANCE
LIFECYCLE
CONTROL
CATALOG
RDBM
S
NOSQL & RT
DATABASE
SECURITY
GOVERNANCE
LIFECYCLE
CONTROL
CATALOG
FS
ETL & DATA
ENGINEERIN
G
SECURITY
GOVERNANCE
LIFECYCLE
CONTROL
CATALOG
RDBM
S
DATA WARE-
HOUSE/MAR
T
RDBM
S
SECURITY
GOVERNANCE
LIFECYCLE
CONTROL
CATALOG

A STRUGGLE AS
OLD AS TIME:
IT VS. BUSINESS
For IT infrastructure & ops
• Single use, inflexible data sources
• Redundancy and fragmentation
For users
• Can’t find data, waiting on IT
• Doing prep work, not finding insights
For head of data & analytics
• Administrative, not innovative
• Can’t meet business requirements

ON-PREMISES
DEPLOYMENT
APPLICATION DATA WARE-
HOUSE/MAR
T
ETL & DATA
ENGINEERIN
G
DATA
SCIENCE
SQL
ANALYTIC
DATABASE
NOSQL & RT
DATABASE
STORAGE
CONTEXT
HDFS KUDU
SECURITY
GOVERNANCE
LIFECYCLE
CONTROL
CATALOG

CLOUD RE-
INTRODUCES
SILOS
APPLICATION DATA WARE-
HOUSE/MAR
T
ETL & DATA
ENGINEERIN
G
DATA
SCIENCE
SQL
ANALYTIC
DATABASE
NOSQL & RT
DATABASE
SECURITY
GOVERNANCE
LIFECYCLE
CONTROL
CATALOG
STORAGE
CONTEXT SECURITY
GOVERNANCE
LIFECYCLE
CONTROL
CATALOG
SECURITY
GOVERNANCE
LIFECYCLE
CONTROL
CATALOG
SECURITY
GOVERNANCE
LIFECYCLE
CONTROL
CATALOG
SECURITY
GOVERNANCE
LIFECYCLE
CONTROL
CATALOG
Microsoft
ADLS
Amazon
S3
HDFS KUDUGoogle
CP
CLOUD

CHALLENGES: SECURITY & GOVERNANCE
• Sharing data across workloads
• Requires multiple copies of data need to be created
• Each with its own set of data context
• Burdensome admin effort
• Multiple clusters = multiple places to administer
• One missing permission in one copy of the data can lead to significant financial
and reputation risk
• Difficult to share data safely for new analyses
• Heavy new regulation such as GDPR makes the challenges even greater

NEGATIVE BUSINESS IMPACT
• Increased operational costs
many distinct environments
to buy and build
• Increased staff overhead
many distinct tools to learn
and support
• Increased security risks
many distinct frameworks to
enforce
• Decreased business insights
narrow data sets and analytics
rigidity
• Decreased business agility
outdated and limiting for
applications blah
• Decreased governance capability
no common visibility across stores

DATA CONTEXT CHALLENGE
Data
stateful
Compute
stateless
Context
stateless

ENABLING STATEFUL AND CONSISTENT CONTEXT

CLOUDERA
ENTERPRISE WITH
SDX
Benefits for IT infra & ops
● Central control and security
● Focus on curating not
firefighting
Benefits for users
● Find value from one source
of truth
● Bring the best tools for each
job
WORKLOADS 3RD PARTY
SERVICES
DATA
ENGINEERIN
G
DATA
SCIENCE
DATA
WAREHOUS
E
OPERATIONA
L DATABASE
DATA CATALOG
GOVERNANCESECURITY LIFECYCLE
MANAGEMENT
STORAGE
Microsoft
ADLS
COMMON SERVICES
HDFS
Amazon
S3
CONTROL
PLANE
KUDU

• Data Catalog: a comprehensive catalog of all data sets, spanning on-premises,
cloud object stores, structured, unstructured, and semi-structured. Includes
technical schemas from the Hive metastore, as well as business glossary
definitions, classifications, and usage guidance
• Security: role-based access control applied consistently across the platform
using Apache Sentry. Also includes full stack encryption and key management
• Governance: enterprise-grade auditing, lineage, and other governance
capabilities applied universally across the platform with rich extensibility for
partner integrations
• Lifecycle Management: comprehensive ingest-to-purge management of data
set lifecycle activities
• Control Plane: multi-environment cluster provisioning, deployment,
management, and troubleshooting
SHARED DATA CONTEXT SERVICES
Built for multi-function analytics anywhere

16 © Cloudera, Inc. All rights reserved.16
DATA
ENGINEERIN
G
DATA
WAREHOUS
E+
DATA
ENGINEERIN
G +
DATA
ENGINEERIN
G
DATA
ENGINEERIN
G +
DATA
SCIENCE
● Run ETL with Spark, MapReduce, or any
number of partner tools
● Assign permissions and classifications once
● Data, along with all data context, is
immediately available in the analytics
database
● Run specialized transient workloads for
security profiling, data preparation, ETL, etc.
● Partner tools can have dedicated clusters
immediately available to all partner tools
● Run ETL with Spark, MapReduce, or any
number of partner tools
● Assign permissions and classifications once
immediately available for data science and
machine learning
EXAMPLE
CLOUDERA SDX
USE CASES
Cloudera SDX makes it easy
for administrators, BI users,
data scientists to work
together on a common data
set, with consistent data
context
Partner tools can use and
enrich data context
automatically

BASED ON COMMON CLOUDERA COMPONENTS
Apache open source and Cloudera unique innovations
DATA CATALOG
HIVE METASTORE
GOVERNANCE
NAVIGATOR
SECURITY
SENTRY
KERBEROS
LIFECYCLE
MANAGEMENT
BDR
NAVIGATOR
COMMON SERVICES CONTROL
PLANE
HUE
ALTUS
MANAGER
DIRECTOR
Microsoft
ADLS
Amazon
S3
Impala

WITH YEARS OF EXPERIENCE
2010 2012 2014 2016 2018
HIVE METASTORE
SENTRY
HUE
KERBEROS
ALTUS
BDR
DIRECTOR
MANAGER
NAVIGATOR

CLOUDERA
ALTUS
PAAS
• Simple
• Self-service
• Auto-elastic
• Role specific
DATA ENGINEERING DATA WAREHOUSE DATA SCIENCE
DATA CATALOG
GOVERNANC
E
SECURITY CONTROL
PLANE
LIFECYCLE
MANAGEMEN
T
soon
Amazon
S3
Microsoft
ADLS beta

CLOUDERA SDX
Available for all workloads that share data across clusters
• Configured SDX:
Self-managed clusters in the cloud - available as of C5.13
• Cloudera Altus SDX:
Altus PaaS clusters - available where Altus is

CLOUDERA SDX: MOTIVATION
2017-Onward
Big Data Analytics and Cloud
1970-2010
OMIT
Compute
Context
Data
Self-contained
appliances with
compute, data
and data context
Cloudera EDH
HiveImpala
Data Context
Spark
Data
2010-2017
Big Data Analytics
Cloudera EDH
Spark
Data
Impala Hive
Data Context
Unified Platform
Multiple Engines
Shared Storage
Shared Data Context
Simplified Multi-Tenant Environment
Multiple Compute Engines
Shared Storage
Shared and Persistent Data Context

Of course! We have our internal EDH cluster. That
would be easy!
With increased focus on … business
insights.. dashboard … FAST...
Charles,
SVP, Emerging Businesses
Mulyadi,
Data Scientist

Pipelines! Workloads! Queries! More
pipelines. More workloads! More queries!
Even more….
Mulyadi,
Data Scientist
Alan,
Internal EDH Data Platform
Manager
Adding more workloads to Internal EDH clusters is
risky and adds uncertainty to existing SLA-sensitive
workloads.

ALAN’S PROBLEM
Databases
Tables Columns
Partitions
Views
Data Size

BACK TO CLOUDERA’S WORLD...
Sales
(SFDC/386 tables)
Support
(Clusterstats/340)
Tables

Maybe separate cluster
with “required” data?
Mulyadi,
Data Scientist
Alan,
Internal EDH Data Platform
Manager
Why not!!

OUR CUSTOMERS’ PROBLEMS
Databases
Tables Views
Partitions Data
Columns

Data Migration Runtime
ALAN AND MULYADI IN THE CLOUD WORLD
Server Procurement
Additional pipelines Data Migration Cost only
Data Migration Dev Scripts
EC2 Hours for Data
Migration only

Support
DATA MIGRATION COSTS GROW EXPONENTIALLY
Internal EDH
Emerging
Businesses
Analytics
Sales
Analytics
37
15
47
27 27
15
Product
Training
Finance
• No single source of truth
• Synchronization overhead
• Stale data

Support
EMBRACE UNIFICATION OF DATA & CONTEXT VIA SDX
Emerging
Businesses
Analytics
Sales
Analytics
Product
Training Finance
Internal EDH

SDX RECAP
• A differentiated capability for sharing of data and data context persistently
• Enables sharing schema, security, governance, audit artifacts
• Akin to linear scalability of Apache Hadoop itself

SDX DEMO

CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY

DATA-DRIVEN
JOURNEY
USE CASES
VISIBILITY
Preventive
& Proactive
Maintenance
IoT Hub for
Industry 4.0
Advanced
Threat
Detection
Risk
Modelling &
Analysis
Marketing
Systems
Integration
Customer
360
Insights
Exploratory
Data
Science
Data
Warehouse
Applied
Machine
Learning
GROW
Sales & Marketing
CONNECT
Operations & Product
PROTECT
Security & Compliance
MODERNIZE
IT, Tech, Data Science & Analytics

CUSTOMER SUCCESSES FOR EDH & SDX
Couldn’t solve predictive maintenance goals
EDH delivers:
• Ingest telematics in real-time
• Machine learning to predict failures
• Analytics to minimize service downtime
• Protect sensitive and regulated data
• Consistent security and governance
• “SDX is the key to making that happen” - CIO
Drug R&D too slow and expensive
EDH delivers:
• Self-service analytics
• Meet HIPAA regulations
• >5 petabytes from 2100 silos
• Using Spark, Impala, & Search side-by-side
• With Anaconda, AtScale, Cloudwick, Kinetica,
StreamSets, Tamr, Trifacta, & Zoomdata

POSITIVE BUSINESS OUTCOMES
• Increased business insights
diverse data together with
analytics flexibility
• Increased business agility
modern and nimble application
innovation
• Increased governance capability
one common viewpoint and store
• Decreased operational costs
one environment for all needs
blahhhhh
• Decreased staff overhead
one set of controls for everything
blahhhh
• Decreased security risks
comprehensive controls
everywhere

YOUR OWN CONSISTENT DATA CONTEXT
Altus, powered by SDX
Free trial: https://cloudera.com/altus
Configured SDX
For C5.13+: http://bit.ly/2Ms5OPO

Cloudera SDX

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Cloudera SDX

Similar to Cloudera SDX (20)

More from Cloudera, Inc.

More from Cloudera, Inc. (20)

Recently uploaded

Recently uploaded (20)

Cloudera SDX

Editor's Notes