SlideShare a Scribd company logo
CLOUDERA SDX
CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WORKLOADS
Wim Stoop | Senior Product Marketing Manager
Santosh Kumar | Senior Product Manager
2 © Cloudera, Inc. All rights reserved.
MULTI-
DISCIPLINARY
ANALYTICS
© Cloudera, Inc. All rights reserved.
WE ALL HAVE BAGGAGE
4 © Cloudera, Inc. All rights reserved.
TRADITIONAL
APPLICATION SILOS
CONTEXT
STORAGE
APPLICATION
SECURITY
GOVERNANCE
LIFECYCLE
CONTROL
CATALOG
DATA
SCIENCE
FS
SQL
ANALYTIC
DATABASE
SECURITY
GOVERNANCE
LIFECYCLE
CONTROL
CATALOG
RDBM
S
NOSQL & RT
DATABASE
SECURITY
GOVERNANCE
LIFECYCLE
CONTROL
CATALOG
FS
ETL & DATA
ENGINEERIN
G
SECURITY
GOVERNANCE
LIFECYCLE
CONTROL
CATALOG
RDBM
S
DATA WARE-
HOUSE/MAR
T
RDBM
S
SECURITY
GOVERNANCE
LIFECYCLE
CONTROL
CATALOG
5 © Cloudera, Inc. All rights reserved.
A STRUGGLE AS
OLD AS TIME:
IT VS. BUSINESS
For IT infrastructure & ops
• Single use, inflexible data sources
• Redundancy and fragmentation
For users
• Can’t find data, waiting on IT
• Doing prep work, not finding insights
For head of data & analytics
• Administrative, not innovative
• Can’t meet business requirements
7 © Cloudera, Inc. All rights reserved.
ON-PREMISES
DEPLOYMENT
APPLICATION DATA WARE-
HOUSE/MAR
T
ETL & DATA
ENGINEERIN
G
DATA
SCIENCE
SQL
ANALYTIC
DATABASE
NOSQL & RT
DATABASE
STORAGE
CONTEXT
HDFS KUDU
SECURITY
GOVERNANCE
LIFECYCLE
CONTROL
CATALOG
8 © Cloudera, Inc. All rights reserved.
CLOUD RE-
INTRODUCES
SILOS
APPLICATION DATA WARE-
HOUSE/MAR
T
ETL & DATA
ENGINEERIN
G
DATA
SCIENCE
SQL
ANALYTIC
DATABASE
NOSQL & RT
DATABASE
SECURITY
GOVERNANCE
LIFECYCLE
CONTROL
CATALOG
STORAGE
CONTEXT SECURITY
GOVERNANCE
LIFECYCLE
CONTROL
CATALOG
SECURITY
GOVERNANCE
LIFECYCLE
CONTROL
CATALOG
SECURITY
GOVERNANCE
LIFECYCLE
CONTROL
CATALOG
SECURITY
GOVERNANCE
LIFECYCLE
CONTROL
CATALOG
Microsoft
ADLS
Amazon
S3
HDFS KUDUGoogle
CP
CLOUD
9 © Cloudera, Inc. All rights reserved.
CHALLENGES: SECURITY & GOVERNANCE
• Sharing data across workloads
• Requires multiple copies of data need to be created
• Each with its own set of data context
• Burdensome admin effort
• Multiple clusters = multiple places to administer
• One missing permission in one copy of the data can lead to significant financial
and reputation risk
• Difficult to share data safely for new analyses
• Heavy new regulation such as GDPR makes the challenges even greater
10 © Cloudera, Inc. All rights reserved.
NEGATIVE BUSINESS IMPACT
• Increased operational costs
many distinct environments
to buy and build
• Increased staff overhead
many distinct tools to learn
and support
• Increased security risks
many distinct frameworks to
enforce
• Decreased business insights
narrow data sets and analytics
rigidity
• Decreased business agility
outdated and limiting for
applications blah
• Decreased governance capability
no common visibility across stores
12 © Cloudera, Inc. All rights reserved.
DATA CONTEXT CHALLENGE
Data
stateful
Compute
stateless
Context
stateless
© Cloudera, Inc. All rights reserved.
ENABLING STATEFUL AND CONSISTENT CONTEXT
14 © Cloudera, Inc. All rights reserved.
CLOUDERA
ENTERPRISE WITH
SDX
Benefits for IT infra & ops
● Central control and security
● Focus on curating not
firefighting
Benefits for users
● Find value from one source
of truth
● Bring the best tools for each
job
WORKLOADS 3RD PARTY
SERVICES
DATA
ENGINEERIN
G
DATA
SCIENCE
DATA
WAREHOUS
E
OPERATIONA
L DATABASE
DATA CATALOG
GOVERNANCESECURITY LIFECYCLE
MANAGEMENT
STORAGE
Microsoft
ADLS
COMMON SERVICES
HDFS
Amazon
S3
CONTROL
PLANE
KUDU
15 © Cloudera, Inc. All rights reserved.
• Data Catalog: a comprehensive catalog of all data sets, spanning on-premises,
cloud object stores, structured, unstructured, and semi-structured. Includes
technical schemas from the Hive metastore, as well as business glossary
definitions, classifications, and usage guidance
• Security: role-based access control applied consistently across the platform
using Apache Sentry. Also includes full stack encryption and key management
• Governance: enterprise-grade auditing, lineage, and other governance
capabilities applied universally across the platform with rich extensibility for
partner integrations
• Lifecycle Management: comprehensive ingest-to-purge management of data
set lifecycle activities
• Control Plane: multi-environment cluster provisioning, deployment,
management, and troubleshooting
SHARED DATA CONTEXT SERVICES
Built for multi-function analytics anywhere
16 © Cloudera, Inc. All rights reserved.16
DATA
ENGINEERIN
G
DATA
WAREHOUS
E+
DATA
ENGINEERIN
G +
DATA
ENGINEERIN
G
DATA
ENGINEERIN
G +
DATA
SCIENCE
● Run ETL with Spark, MapReduce, or any
number of partner tools
● Assign permissions and classifications once
● Data, along with all data context, is
immediately available in the analytics
database
● Run specialized transient workloads for
security profiling, data preparation, ETL, etc.
● Partner tools can have dedicated clusters
● Data, along with all data context, is
immediately available to all partner tools
● Run ETL with Spark, MapReduce, or any
number of partner tools
● Assign permissions and classifications once
● Data, along with all data context, is
immediately available for data science and
machine learning
EXAMPLE
CLOUDERA SDX
USE CASES
Cloudera SDX makes it easy
for administrators, BI users,
data scientists to work
together on a common data
set, with consistent data
context
Partner tools can use and
enrich data context
automatically
17 © Cloudera, Inc. All rights reserved.
BASED ON COMMON CLOUDERA COMPONENTS
Apache open source and Cloudera unique innovations
DATA CATALOG
HIVE METASTORE
GOVERNANCE
NAVIGATOR
SECURITY
SENTRY
KERBEROS
LIFECYCLE
MANAGEMENT
BDR
NAVIGATOR
COMMON SERVICES CONTROL
PLANE
HUE
ALTUS
MANAGER
DIRECTOR
Microsoft
ADLS
Amazon
S3
Impala
18 © Cloudera, Inc. All rights reserved.
WITH YEARS OF EXPERIENCE
2010 2012 2014 2016 2018
HIVE METASTORE
SENTRY
HUE
KERBEROS
ALTUS
BDR
DIRECTOR
MANAGER
NAVIGATOR
19 © Cloudera, Inc. All rights reserved.
CLOUDERA
ALTUS
PAAS
• Simple
• Self-service
• Auto-elastic
• Role specific
DATA ENGINEERING DATA WAREHOUSE DATA SCIENCE
DATA CATALOG
GOVERNANC
E
SECURITY CONTROL
PLANE
LIFECYCLE
MANAGEMEN
T
soon
Amazon
S3
Microsoft
ADLS beta
20 © Cloudera, Inc. All rights reserved.
CLOUDERA SDX
Available for all workloads that share data across clusters
• Configured SDX:
Self-managed clusters in the cloud - available as of C5.13
• Cloudera Altus SDX:
Altus PaaS clusters - available where Altus is
21 © Cloudera, Inc. All rights reserved.
CLOUDERA SDX: MOTIVATION
2017-Onward
Big Data Analytics and Cloud
1970-2010
OMIT
Compute
Context
Data
Self-contained
appliances with
compute, data
and data context
Cloudera EDH
HiveImpala
Data Context
Spark
Data
2010-2017
Big Data Analytics
Cloudera EDH
Spark
Data
Impala Hive
Data Context
Unified Platform
Multiple Engines
Shared Storage
Shared Data Context
Simplified Multi-Tenant Environment
Multiple Compute Engines
Shared Storage
Shared and Persistent Data Context
Of course! We have our internal EDH cluster. That
would be easy!
With increased focus on … business
insights.. dashboard … FAST...
Charles,
SVP, Emerging Businesses
Mulyadi,
Data Scientist
Pipelines! Workloads! Queries! More
pipelines. More workloads! More queries!
Even more….
Mulyadi,
Data Scientist
Alan,
Internal EDH Data Platform
Manager
Adding more workloads to Internal EDH clusters is
risky and adds uncertainty to existing SLA-sensitive
workloads.
24 © Cloudera, Inc. All rights reserved.
ALAN’S PROBLEM
Databases
Tables Columns
Partitions
Views
Data Size
25 © Cloudera, Inc. All rights reserved.
BACK TO CLOUDERA’S WORLD...
Sales
(SFDC/386 tables)
Support
(Clusterstats/340)
Tables
26 © Cloudera, Inc. All rights reserved.
Maybe separate cluster
with “required” data?
Mulyadi,
Data Scientist
Alan,
Internal EDH Data Platform
Manager
Why not!!
27 © Cloudera, Inc. All rights reserved.
OUR CUSTOMERS’ PROBLEMS
Databases
Tables Views
Partitions Data
Columns
28 © Cloudera, Inc. All rights reserved.
Data Migration Runtime
ALAN AND MULYADI IN THE CLOUD WORLD
Server Procurement
Additional pipelines Data Migration Cost only
Data Migration Dev Scripts
EC2 Hours for Data
Migration only
29 © Cloudera, Inc. All rights reserved.
Support
DATA MIGRATION COSTS GROW EXPONENTIALLY
Internal EDH
Emerging
Businesses
Analytics
Sales
Analytics
37
15
47
27 27
15
Product
Training
Finance
• No single source of truth
• Synchronization overhead
• Stale data
30 © Cloudera, Inc. All rights reserved.
Support
EMBRACE UNIFICATION OF DATA & CONTEXT VIA SDX
Emerging
Businesses
Analytics
Sales
Analytics
Product
Training Finance
Internal EDH
31 © Cloudera, Inc. All rights reserved.
SDX RECAP
• A differentiated capability for sharing of data and data context persistently
• Enables sharing schema, security, governance, audit artifacts
• Akin to linear scalability of Apache Hadoop itself
32 © Cloudera, Inc. All rights reserved.
SDX DEMO
© Cloudera, Inc. All rights reserved.
CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY
34 © Cloudera, Inc. All rights reserved.
DATA-DRIVEN
JOURNEY
USE CASES
VISIBILITY
Preventive
& Proactive
Maintenance
IoT Hub for
Industry 4.0
Advanced
Threat
Detection
Risk
Modelling &
Analysis
Marketing
Systems
Integration
Customer
360
Insights
Exploratory
Data
Science
Data
Warehouse
Applied
Machine
Learning
GROW
Sales & Marketing
CONNECT
Operations & Product
PROTECT
Security & Compliance
MODERNIZE
IT, Tech, Data Science & Analytics
35 © Cloudera, Inc. All rights reserved.
CUSTOMER SUCCESSES FOR EDH & SDX
Couldn’t solve predictive maintenance goals
EDH delivers:
• Ingest telematics in real-time
• Machine learning to predict failures
• Analytics to minimize service downtime
• Protect sensitive and regulated data
• Consistent security and governance
• “SDX is the key to making that happen” - CIO
Drug R&D too slow and expensive
EDH delivers:
• Self-service analytics
• Meet HIPAA regulations
• >5 petabytes from 2100 silos
• Using Spark, Impala, & Search side-by-side
• With Anaconda, AtScale, Cloudwick, Kinetica,
StreamSets, Tamr, Trifacta, & Zoomdata
36 © Cloudera, Inc. All rights reserved.
POSITIVE BUSINESS OUTCOMES
• Increased business insights
diverse data together with
analytics flexibility
• Increased business agility
modern and nimble application
innovation
• Increased governance capability
one common viewpoint and store
• Decreased operational costs
one environment for all needs
blahhhhh
• Decreased staff overhead
one set of controls for everything
blahhhh
• Decreased security risks
comprehensive controls
everywhere
37 © Cloudera, Inc. All rights reserved.
YOUR OWN CONSISTENT DATA CONTEXT
Altus, powered by SDX
Free trial: https://cloudera.com/altus
Configured SDX
For C5.13+: http://bit.ly/2Ms5OPO
THANK YOU

More Related Content

What's hot

Data Mesh
Data MeshData Mesh
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
Trivadis
 
Informatica Cloud Overview
Informatica Cloud OverviewInformatica Cloud Overview
Informatica Cloud Overview
Darren Cunningham
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
Matei Zaharia
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at Scale
Adam Doyle
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
Jeffrey T. Pollock
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
Databricks
 
Data Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best PracticesData Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best Practices
CitiusTech
 
adb.pdf
adb.pdfadb.pdf
Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies
SnapLogic
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
Databricks
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
Amazon Web Services
 
Databricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With Data
Databricks
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
Vivek Aanand Ganesan
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
Databricks
 
Cloud Migration Strategy and Best Practices
Cloud Migration Strategy and Best PracticesCloud Migration Strategy and Best Practices
Cloud Migration Strategy and Best Practices
QBurst
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Preparing a data migration plan: A practical guide
Preparing a data migration plan: A practical guidePreparing a data migration plan: A practical guide
Preparing a data migration plan: A practical guide
ETLSolutions
 

What's hot (20)

Data Mesh
Data MeshData Mesh
Data Mesh
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
 
Informatica Cloud Overview
Informatica Cloud OverviewInformatica Cloud Overview
Informatica Cloud Overview
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at Scale
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Data Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best PracticesData Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best Practices
 
adb.pdf
adb.pdfadb.pdf
adb.pdf
 
Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Databricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With Data
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Cloud Migration Strategy and Best Practices
Cloud Migration Strategy and Best PracticesCloud Migration Strategy and Best Practices
Cloud Migration Strategy and Best Practices
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Preparing a data migration plan: A practical guide
Preparing a data migration plan: A practical guidePreparing a data migration plan: A practical guide
Preparing a data migration plan: A practical guide
 

Similar to Cloudera SDX

Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Matt Stubbs
 
Hybrid is the New Normal
Hybrid is the New NormalHybrid is the New Normal
Hybrid is the New Normal
DataWorks Summit
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
Cloudera, Inc.
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Cloudera, Inc.
 
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloudA deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloud
Cloudera, Inc.
 
Cloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemachtCloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
 
Cloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for Analytics
Cloudera, Inc.
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloudera, Inc.
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
Cloudera, Inc.
 
Self-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft AzureSelf-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft Azure
Cloudera, Inc.
 
Cloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudCloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the Cloud
GoDataDriven
 
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadCloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
DataWorks Summit
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera, Inc.
 
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...Comment développer une stratégie Big Data dans le cloud public avec l'offre P...
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...
Cloudera, Inc.
 
SDX Pitch Deck (201) - Apresentação SDP 2024
SDX Pitch Deck (201) - Apresentação SDP 2024SDX Pitch Deck (201) - Apresentação SDP 2024
SDX Pitch Deck (201) - Apresentação SDP 2024
PauloEduardoBitarJun
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartch
Cloudera, Inc.
 

Similar to Cloudera SDX (20)

Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
 
Hybrid is the New Normal
Hybrid is the New NormalHybrid is the New Normal
Hybrid is the New Normal
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
 
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloudA deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloud
 
Cloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemachtCloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemacht
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Cloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for Analytics
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
 
Self-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft AzureSelf-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft Azure
 
Cloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudCloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the Cloud
 
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadCloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made Easy
 
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...Comment développer une stratégie Big Data dans le cloud public avec l'offre P...
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...
 
SDX Pitch Deck (201) - Apresentação SDP 2024
SDX Pitch Deck (201) - Apresentação SDP 2024SDX Pitch Deck (201) - Apresentação SDP 2024
SDX Pitch Deck (201) - Apresentação SDP 2024
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartch
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
 
Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18
Cloudera, Inc.
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solution
Cloudera, Inc.
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18
Cloudera, Inc.
 
How Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR complianceHow Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR compliance
Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solution
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18
 
How Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR complianceHow Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR compliance
 

Recently uploaded

AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Ukraine
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
Sunil Jagani
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
DianaGray10
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
Fwdays
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!
Tobias Schneck
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
ScyllaDB
 

Recently uploaded (20)

AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
 

Cloudera SDX

  • 1. CLOUDERA SDX CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WORKLOADS Wim Stoop | Senior Product Marketing Manager Santosh Kumar | Senior Product Manager
  • 2. 2 © Cloudera, Inc. All rights reserved. MULTI- DISCIPLINARY ANALYTICS
  • 3. © Cloudera, Inc. All rights reserved. WE ALL HAVE BAGGAGE
  • 4. 4 © Cloudera, Inc. All rights reserved. TRADITIONAL APPLICATION SILOS CONTEXT STORAGE APPLICATION SECURITY GOVERNANCE LIFECYCLE CONTROL CATALOG DATA SCIENCE FS SQL ANALYTIC DATABASE SECURITY GOVERNANCE LIFECYCLE CONTROL CATALOG RDBM S NOSQL & RT DATABASE SECURITY GOVERNANCE LIFECYCLE CONTROL CATALOG FS ETL & DATA ENGINEERIN G SECURITY GOVERNANCE LIFECYCLE CONTROL CATALOG RDBM S DATA WARE- HOUSE/MAR T RDBM S SECURITY GOVERNANCE LIFECYCLE CONTROL CATALOG
  • 5. 5 © Cloudera, Inc. All rights reserved. A STRUGGLE AS OLD AS TIME: IT VS. BUSINESS For IT infrastructure & ops • Single use, inflexible data sources • Redundancy and fragmentation For users • Can’t find data, waiting on IT • Doing prep work, not finding insights For head of data & analytics • Administrative, not innovative • Can’t meet business requirements
  • 6. 7 © Cloudera, Inc. All rights reserved. ON-PREMISES DEPLOYMENT APPLICATION DATA WARE- HOUSE/MAR T ETL & DATA ENGINEERIN G DATA SCIENCE SQL ANALYTIC DATABASE NOSQL & RT DATABASE STORAGE CONTEXT HDFS KUDU SECURITY GOVERNANCE LIFECYCLE CONTROL CATALOG
  • 7. 8 © Cloudera, Inc. All rights reserved. CLOUD RE- INTRODUCES SILOS APPLICATION DATA WARE- HOUSE/MAR T ETL & DATA ENGINEERIN G DATA SCIENCE SQL ANALYTIC DATABASE NOSQL & RT DATABASE SECURITY GOVERNANCE LIFECYCLE CONTROL CATALOG STORAGE CONTEXT SECURITY GOVERNANCE LIFECYCLE CONTROL CATALOG SECURITY GOVERNANCE LIFECYCLE CONTROL CATALOG SECURITY GOVERNANCE LIFECYCLE CONTROL CATALOG SECURITY GOVERNANCE LIFECYCLE CONTROL CATALOG Microsoft ADLS Amazon S3 HDFS KUDUGoogle CP CLOUD
  • 8. 9 © Cloudera, Inc. All rights reserved. CHALLENGES: SECURITY & GOVERNANCE • Sharing data across workloads • Requires multiple copies of data need to be created • Each with its own set of data context • Burdensome admin effort • Multiple clusters = multiple places to administer • One missing permission in one copy of the data can lead to significant financial and reputation risk • Difficult to share data safely for new analyses • Heavy new regulation such as GDPR makes the challenges even greater
  • 9. 10 © Cloudera, Inc. All rights reserved. NEGATIVE BUSINESS IMPACT • Increased operational costs many distinct environments to buy and build • Increased staff overhead many distinct tools to learn and support • Increased security risks many distinct frameworks to enforce • Decreased business insights narrow data sets and analytics rigidity • Decreased business agility outdated and limiting for applications blah • Decreased governance capability no common visibility across stores
  • 10. 12 © Cloudera, Inc. All rights reserved. DATA CONTEXT CHALLENGE Data stateful Compute stateless Context stateless
  • 11. © Cloudera, Inc. All rights reserved. ENABLING STATEFUL AND CONSISTENT CONTEXT
  • 12. 14 © Cloudera, Inc. All rights reserved. CLOUDERA ENTERPRISE WITH SDX Benefits for IT infra & ops ● Central control and security ● Focus on curating not firefighting Benefits for users ● Find value from one source of truth ● Bring the best tools for each job WORKLOADS 3RD PARTY SERVICES DATA ENGINEERIN G DATA SCIENCE DATA WAREHOUS E OPERATIONA L DATABASE DATA CATALOG GOVERNANCESECURITY LIFECYCLE MANAGEMENT STORAGE Microsoft ADLS COMMON SERVICES HDFS Amazon S3 CONTROL PLANE KUDU
  • 13. 15 © Cloudera, Inc. All rights reserved. • Data Catalog: a comprehensive catalog of all data sets, spanning on-premises, cloud object stores, structured, unstructured, and semi-structured. Includes technical schemas from the Hive metastore, as well as business glossary definitions, classifications, and usage guidance • Security: role-based access control applied consistently across the platform using Apache Sentry. Also includes full stack encryption and key management • Governance: enterprise-grade auditing, lineage, and other governance capabilities applied universally across the platform with rich extensibility for partner integrations • Lifecycle Management: comprehensive ingest-to-purge management of data set lifecycle activities • Control Plane: multi-environment cluster provisioning, deployment, management, and troubleshooting SHARED DATA CONTEXT SERVICES Built for multi-function analytics anywhere
  • 14. 16 © Cloudera, Inc. All rights reserved.16 DATA ENGINEERIN G DATA WAREHOUS E+ DATA ENGINEERIN G + DATA ENGINEERIN G DATA ENGINEERIN G + DATA SCIENCE ● Run ETL with Spark, MapReduce, or any number of partner tools ● Assign permissions and classifications once ● Data, along with all data context, is immediately available in the analytics database ● Run specialized transient workloads for security profiling, data preparation, ETL, etc. ● Partner tools can have dedicated clusters ● Data, along with all data context, is immediately available to all partner tools ● Run ETL with Spark, MapReduce, or any number of partner tools ● Assign permissions and classifications once ● Data, along with all data context, is immediately available for data science and machine learning EXAMPLE CLOUDERA SDX USE CASES Cloudera SDX makes it easy for administrators, BI users, data scientists to work together on a common data set, with consistent data context Partner tools can use and enrich data context automatically
  • 15. 17 © Cloudera, Inc. All rights reserved. BASED ON COMMON CLOUDERA COMPONENTS Apache open source and Cloudera unique innovations DATA CATALOG HIVE METASTORE GOVERNANCE NAVIGATOR SECURITY SENTRY KERBEROS LIFECYCLE MANAGEMENT BDR NAVIGATOR COMMON SERVICES CONTROL PLANE HUE ALTUS MANAGER DIRECTOR Microsoft ADLS Amazon S3 Impala
  • 16. 18 © Cloudera, Inc. All rights reserved. WITH YEARS OF EXPERIENCE 2010 2012 2014 2016 2018 HIVE METASTORE SENTRY HUE KERBEROS ALTUS BDR DIRECTOR MANAGER NAVIGATOR
  • 17. 19 © Cloudera, Inc. All rights reserved. CLOUDERA ALTUS PAAS • Simple • Self-service • Auto-elastic • Role specific DATA ENGINEERING DATA WAREHOUSE DATA SCIENCE DATA CATALOG GOVERNANC E SECURITY CONTROL PLANE LIFECYCLE MANAGEMEN T soon Amazon S3 Microsoft ADLS beta
  • 18. 20 © Cloudera, Inc. All rights reserved. CLOUDERA SDX Available for all workloads that share data across clusters • Configured SDX: Self-managed clusters in the cloud - available as of C5.13 • Cloudera Altus SDX: Altus PaaS clusters - available where Altus is
  • 19. 21 © Cloudera, Inc. All rights reserved. CLOUDERA SDX: MOTIVATION 2017-Onward Big Data Analytics and Cloud 1970-2010 OMIT Compute Context Data Self-contained appliances with compute, data and data context Cloudera EDH HiveImpala Data Context Spark Data 2010-2017 Big Data Analytics Cloudera EDH Spark Data Impala Hive Data Context Unified Platform Multiple Engines Shared Storage Shared Data Context Simplified Multi-Tenant Environment Multiple Compute Engines Shared Storage Shared and Persistent Data Context
  • 20. Of course! We have our internal EDH cluster. That would be easy! With increased focus on … business insights.. dashboard … FAST... Charles, SVP, Emerging Businesses Mulyadi, Data Scientist
  • 21. Pipelines! Workloads! Queries! More pipelines. More workloads! More queries! Even more…. Mulyadi, Data Scientist Alan, Internal EDH Data Platform Manager Adding more workloads to Internal EDH clusters is risky and adds uncertainty to existing SLA-sensitive workloads.
  • 22. 24 © Cloudera, Inc. All rights reserved. ALAN’S PROBLEM Databases Tables Columns Partitions Views Data Size
  • 23. 25 © Cloudera, Inc. All rights reserved. BACK TO CLOUDERA’S WORLD... Sales (SFDC/386 tables) Support (Clusterstats/340) Tables
  • 24. 26 © Cloudera, Inc. All rights reserved. Maybe separate cluster with “required” data? Mulyadi, Data Scientist Alan, Internal EDH Data Platform Manager Why not!!
  • 25. 27 © Cloudera, Inc. All rights reserved. OUR CUSTOMERS’ PROBLEMS Databases Tables Views Partitions Data Columns
  • 26. 28 © Cloudera, Inc. All rights reserved. Data Migration Runtime ALAN AND MULYADI IN THE CLOUD WORLD Server Procurement Additional pipelines Data Migration Cost only Data Migration Dev Scripts EC2 Hours for Data Migration only
  • 27. 29 © Cloudera, Inc. All rights reserved. Support DATA MIGRATION COSTS GROW EXPONENTIALLY Internal EDH Emerging Businesses Analytics Sales Analytics 37 15 47 27 27 15 Product Training Finance • No single source of truth • Synchronization overhead • Stale data
  • 28. 30 © Cloudera, Inc. All rights reserved. Support EMBRACE UNIFICATION OF DATA & CONTEXT VIA SDX Emerging Businesses Analytics Sales Analytics Product Training Finance Internal EDH
  • 29. 31 © Cloudera, Inc. All rights reserved. SDX RECAP • A differentiated capability for sharing of data and data context persistently • Enables sharing schema, security, governance, audit artifacts • Akin to linear scalability of Apache Hadoop itself
  • 30. 32 © Cloudera, Inc. All rights reserved. SDX DEMO
  • 31. © Cloudera, Inc. All rights reserved. CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY
  • 32. 34 © Cloudera, Inc. All rights reserved. DATA-DRIVEN JOURNEY USE CASES VISIBILITY Preventive & Proactive Maintenance IoT Hub for Industry 4.0 Advanced Threat Detection Risk Modelling & Analysis Marketing Systems Integration Customer 360 Insights Exploratory Data Science Data Warehouse Applied Machine Learning GROW Sales & Marketing CONNECT Operations & Product PROTECT Security & Compliance MODERNIZE IT, Tech, Data Science & Analytics
  • 33. 35 © Cloudera, Inc. All rights reserved. CUSTOMER SUCCESSES FOR EDH & SDX Couldn’t solve predictive maintenance goals EDH delivers: • Ingest telematics in real-time • Machine learning to predict failures • Analytics to minimize service downtime • Protect sensitive and regulated data • Consistent security and governance • “SDX is the key to making that happen” - CIO Drug R&D too slow and expensive EDH delivers: • Self-service analytics • Meet HIPAA regulations • >5 petabytes from 2100 silos • Using Spark, Impala, & Search side-by-side • With Anaconda, AtScale, Cloudwick, Kinetica, StreamSets, Tamr, Trifacta, & Zoomdata
  • 34. 36 © Cloudera, Inc. All rights reserved. POSITIVE BUSINESS OUTCOMES • Increased business insights diverse data together with analytics flexibility • Increased business agility modern and nimble application innovation • Increased governance capability one common viewpoint and store • Decreased operational costs one environment for all needs blahhhhh • Decreased staff overhead one set of controls for everything blahhhh • Decreased security risks comprehensive controls everywhere
  • 35. 37 © Cloudera, Inc. All rights reserved. YOUR OWN CONSISTENT DATA CONTEXT Altus, powered by SDX Free trial: https://cloudera.com/altus Configured SDX For C5.13+: http://bit.ly/2Ms5OPO

Editor's Notes

  1. Multi-disciplinary analytics: vehicle to go from insight to action. It is the foundation for all your analytics innovation Determines how successfully you can meet both your business & operational goals Will enable you to ask bigger questions and solve complex analytics problems Everyone is doing it – experience with our customers We’ve previously done webinars on just that topic, so please reach out if you’d like to receive a link to a replay for those Requires a platform that: Support multi-function analytics Minimize time to add workloads Support elastic workloads Enable self-service Provide a scalable model for sharing data Reduce cost Increase tenant isolation Secure the environment
  2. We’ve all been doing analytics for a very long time, for a variety of reasons and in a range of departments. So we’ve accumulated some baggage
  3. We’ve gathered typical legacy systems. Each own application does its own thing, has it’s own data storage and also its own context. Many data silos, each requiring its own proprietary tools and infrastructure. Each application or workload has dedicated compute, data storage and context Different vendors, products, and services. Throughout a project, they’re all needed and in different ratios. You start off with ETL/DE DW/M Analytics in SQL NoSQL/RT DS Each in it’s own siloed application. And you may have multiple of each kind, of each workload. A fragmented approach is difficult, expensive, and risky. Managing and guaranteeing security/compliance etc across the board is hard. Nothing’s shared. Context is reestablished each and every time. New application or workload, new context. Whole disciplines grown around trying to keep this all in sync
  4. A struggle or, for the Disney fans, a tale as old as time between business and IT., More beast than beauty though Different problems for different users. IT Each application or workload is single use, data sources are inflexible See lots of redundancy of data, inefficient operating environment Users Can’t find data they need. Waiting on IT Finding out of date data Doing prep, not finding insights Head of data and analytics Administrative, not innovating – choosing databases rather than operationalizing Users resort to shadow IT to work around their challenges
  5. Organisation vs business unit
  6. So then use a big data platform on premises. A single physical cluster. Everything in one place so the shared data experience to multiple workloads and tenants is given Strong multi-function support Shared data experience Information security model Ok-ish, moderate Cost management Tenant isolation Workload elasticity And it’s ok to be on premises, even though Weak Self service Speed of deployment To address those two challenges, organisations are branching out and looking at alternatives to meet the business needs in those areas. Time to insight and action needs to be forever shorter: self service. New capacity is needed in the next 10m and the need is gone by tomorrow. That sounds like cloud, right?
  7. Well, certainly right on those fronts. Cloud deployments are strong where on-premises is weak: Strong Tenant isolation Workload elasticity Self-service Speed of deployment But if you haven’t carefully looked at your cloud strategy, you can find yourself re-introducing context silos. Weak Shared data experience Information security model It’s compounded by the fact we’re now dealing with a mix of permanent and transient workloads. Back to square one in terms of challenges. Biggest ones around: security and governance
  8. Irrespective of what audience, the impact on business is detrimental
  9. The crux is the data context. To have a workload, you need data, compute power AND context Compute and data are becoming further separated Compute is stateless: cloud-based or on-prem, either transient or long-running Data is stateful: cloud-based or on-prem in HDFS, Kudu, S3, ADLS, Isilon, etc. What about data context? Schema Definitions Permissions Encryption Keys Governance Data context should be stateful, but currently is stateless This creates synchronization and usability challenges for admins and end users alike
  10. So we introduced Cloudera SDX - or shared data experience – the foundations of Cloudera Enterprise. SDX makes it possible for companies to run dozens - hundreds - of analytic applications against a common pool of data. One logical cluster provides a shared data experience to multiple workloads and tenants SDX applies a centralized, consistent framework for catalog, security, governance, management, data ingest and more. It makes it faster, easier, and safer for organizations, teams, people to develop and deploy high-value, multi-function use cases like customer next best offer, clinical prediction, and risk modeling. SDX cuts through silos to unify data, analytics, management, security, and governance, and empowers self-service It combines the strengths of on-premises and cloud only deployments: * multi-function support * shared data experience * information security model * cost management * tenant isolation * workload elasticity * self service * speed of deployment
  11. SDX is a set of open platform services built for multi-functional or multi-disciplinary analytics that have been optimized for the cloud. This means a shared catalog that helps to define and preserve the structure and the business context of all your data, regardless of where it happens to reside. that we offer a unified security model that helps protect sensitive data with a consistent set of controls that we offer a consistent governance model that enables self-service secure access to all of your relevant data. Not just one type of data, really to all of it, increasing your ability to be compliant, particularly in a regulatory environment. Next, flexible data ingest and replication. We have a number of core partners that we work with in this arena that help you aggregate a single copy of all of your data, providing you easier debt disaster recovery and that eases migration of data from one place to another. Finally, easy workload management that increases user productivity and boosts job predictability. So, SDX is really a core piece of how we at Cloudera separate ourselves from the competition.
  12. Couple of examples how SDX helps. The list is endless
  13. Satosh
  14. Santosh
  15. Santosh Speak about over time this will all become one thing
  16. Santosh Prefix with a couple of slides illustrating the challenge without SDX
  17. Impact of consistency on the journey Enables Simplifies
  18. Example of large European financial org: Reduce fraud, improve accuracy from 64 to 98% and 80% faster customer service Cluster spin up with full context and deployed to three different clouds in <1h Three admin vs 15 Concept to production in 6 months
  19. How to get yours