SlideShare a Scribd company logo
CLOUDERA SDX
[NAME] | [ROLE]
© 2022 Cloudera, Inc. All rights reserved. 2
DATA MANAGEMENT IS SPREAD ALL OVER
47% 21%
24%
26%
32%
On-premises Single cloud
Multi cloud
Hybrid cloud
Private cloud
Gartner recently warned that “Data and analytics leaders must
prepare for the complexities of multi cloud and intercloud
deployments to avoid potential performance issues… unplanned
cost overruns and ... difficulties with integration efforts.”
HBR June
2019
© 2022 Cloudera, Inc. All rights reserved. 3
“You should be able to select the type of workload,
select the capacity, run the job, and meet
security requirements”
- Kaushik Deka, Novantas partner and CTO
© 2022 Cloudera, Inc. All rights reserved. 4
“Enterprise IT doesn’t operate
at the speed of business.
Your IT group needs to perform
better than shadow IT.”
CIO Magazine
© 2022 Cloudera, Inc. All rights reserved. 5
STATUS QUO
Shadow IT as part of
overall IT budget spend
Organizations with a
hybrid or multi-cloud
strategy
Annual compliance cost in
financial service firms
40% 84% $31M
© 2022 Cloudera, Inc. All rights reserved. 6
CONVENTIONAL WISDOM: “SEPARATE STORAGE & COMPUTE”
Monocluster
Architecture
Cloud-based Cluster
Architecture
• Agillity
• Elasticity
• Isolation
• Transience
Physical Servers
COMPUTE
STORAGE
Virtual Machines
COMPUTE
Object Storage
STORAGE
© 2022 Cloudera, Inc. All rights reserved. 7
CONTEXT IS KEY BUT WHERE DOES IT GO?
Monocluster
Architecture
Cloud-based Cluster
Architecture*
* Only works for simple use cases
Metadata, metrics, audit, lineage
permissions, filters, attributes
Physical Servers
COMPUTE
CONTEXT
STORAGE
Virtual Machines
COMPUTE
Object Storage
STORAGE
• Single tenant
• Single workload
• Non-sensitive data
• Non-regulated use case
© 2022 Cloudera, Inc. All rights reserved. 8
DATA LAKES NEED SEPARATE COMPUTE, STORAGE & CONTEXT
•Multi-user
•Multi-workload
• Sensitive data
• Regulated use case
Context is required for all but the
simplest use cases
• Long Running Apps
•Discoverability
•Trust
•Compliance
•Reuse
•Sharing
•Security
Identities Schema Policy
Profiling & Tags Replication Workloads
Audits
Encryption
In order to provide:
© 2022 Cloudera, Inc. All rights reserved. 9
NO CONTEXT, NO SOLUTION, EVEN IN NATIVE CLOUD
Monocluster
Architecture
Cloud-based Cluster
Architecture
Physical Servers
COMPUTE
CONTEXT
STORAGE
Virtual Machines
COMPUTE
Object Storage
STORAGE
© 2022 Cloudera, Inc. All rights reserved. 10
CDP SEPARATES STORAGE, COMPUTE & CONTEXT
Monocluster
Architecture
Cloud-based Cluster
Architecture
Cloud-native Hybrid Data
Architecture
Physical Servers
COMPUTE
CONTEXT
STORAGE
Virtual Machines
COMPUTE
Object Storage
STORAGE
Object Storage
STORAGE
Virtual Machines
COMPUTE
Always-on-Service
CONTEXT
ENABLING CONSISTENT
DATA CONTEXT
© 2022 Cloudera, Inc. All rights reserved. 12
MANAGES AND SECURES THE ENTIRE DATA LIFECYCLE
In any cloud or datacenter
01 03
04
05
STREAMING &
DATA FLOW
DATA
ENGINEERING
DATA
WAREHOUSE
OPERATIONAL
DATABASE
MACHINE
LEARNING & AI
02
Collect
Enrich
Report
Serve
Predict
SECURITY | GOVERNANCE | CATALOG | METADATA | INTELLIGENCE
13
© 2022 Cloudera, Inc. All rights reserved.
Delivers the best of CDP Public
and Private cloud - the power of
“and”
• Faster time to value and increased IT
controls
• High performance and low cost
• Massive scale and bursty jobs
• Made for data engineers and data
scientists
• Tools to build complex apps and
simple dashboards
CDP HYBRID
DATA
PLATFORM
© 2022 Cloudera, Inc. All rights reserved. 14
SDX - CONSISTENT SECURITY AND GOVERNANCE
Built for multi-functional analytics anywhere
• Metadata: establish information assets for increased usability, trust and value
leveraging all metadata (structural, operational, business and social)
• Security: granular, dynamic, role- and attribute-based security policies. Prevent
and audit unauthorized access to sensitive or restricted data across platform
• Encryption: ultimate protection through automatic configuration of Kerberos
backed authentication, and strong cryptography for data in motion and rest
• Control: move data and workloads between deployments for optimum
performance, cost and resilience, meeting ever changing business needs
• Governance: enterprise-grade auditing, lineage, and governance capabilities
applied across the platform with rich extensibility for partner integrations
© 2022 Cloudera, Inc. All rights reserved. 15
DATA CATALOG: WINDOW ONTO SDX
Discover data and audit usage
• For data stewards and end-users
• Discover, curate, tag fabric data
• Establish trust through lineage and
context through business glossary
• Create and manage data access
policies
● ABAC, RBAC; by file, table,
column, row, etc.
• Audit and identify what data a user
has accessed
SDX USE CASES
© 2022 Cloudera, Inc. All rights reserved. 17
FAST DATA
ONBOARDING
• Examine data as it comes in
● Automatic data profiling
• Discover data make-up
● Sensitive data recognition
• Classify and characterize data
● Attribute based access control
© 2022 Cloudera, Inc. All rights reserved. 18
MULTI-USER
DATA ACCESS
• Determine how data is seen
● Attribute based access control
• Track (derived) data usage
● Data lineage
• Sticky data characterization
● Classification propagation
© 2022 Cloudera, Inc. All rights reserved. 19
SAFE DATA
ACCESS
EXPANSION
• Classification propagation for
derived data along lineage
● Attribute based access control
remains applicable and enforced
• Consistent security and
governance
● All analytics, all deployments
© 2022 Cloudera, Inc. All rights reserved. 20
BONUS
ACHIEVEMENT
• Good governance
• Leads to compliance
● Industry regulation
● Data privacy regulations
SDX TECHNICAL DETAIL
© 2022 Cloudera, Inc. All rights reserved. 22
BASED ON COMMUNITY OPEN SOURCE COMPONENTS
Ensuring consistent security and governance
Metastore - Schema
- Security
- Governance
- Single Sign On
© 2022 Cloudera, Inc. All rights reserved. 23
UNDER THE HOOD: SECURITY & METADATA
User management Synchronize users and groups defined in corporate domain to CDP control plane through SAML
Network security Kerberos, DNS, TLS enabled by default using a managed FreeIPA instance.
Encryption Support for cloud storage native encryption at rest and in transit using TLS
SSO Enable SSO access to all web UIs through Apache Knox
Storage security Easily map corporate user identities to cloud IAM roles using IDBroker and Ranger’s upcoming RAZ
Schema Manage structured metadata using Hive Metastore & Schema Registry
Identities Schema Policy Audits
© 2022 Cloudera, Inc. All rights reserved. 24
Authorization Restrict col / row / field access based on user, role, tag, or user attribute using Apache Ranger
Governance Capture audit trail and lineage graphs (from ingest to model serving) using Apache Atlas
Data stewardship Discover, manage, curate data assets with enhanced business metadata & workflows with Data
Catalog
Data profiling Automatically scan incoming data for sensitive fields and tag appropriately using Data Catalog
Workload management Track workloads and jobs for easy troubleshooting, optimal performance using Workload Manager
Replication Move data along with all context across hybrid and multi-clouds using Replication Manager
UNDER THE HOOD: ACTIVE MANAGEMENT
Data Profiling &
Stewardship
Replication Workload
Management
Encryption
© 2022 Cloudera, Inc. All rights reserved. 25
SDX ARCHITECTURE IN CDP CONTEXT
Hive Metastore - Schema
Ranger - Security
Atlas - Governance
Knox - Single Sign On
Environment
Managed
user directory,
authentication &
certificate
services DW, ML, DE, …
Self-Serve
Experiences
Persistent
Disks
Data Hub
Virtual Private
Clusters
Relational
Datastores
Management
Console
Data Catalog Replication Manager Workload Manager Identity Management Data Lake Management
© 2022 Cloudera, Inc. All rights reserved. 28
APAC Environment
Data Lake
Ranger
Atlas
DC
Profilers
Workload 21
Workload 22
Workload 23
…
CDP Management
Console
DATA CATALOG ARCHITECTURE
Global Data Catalog across multiple Data Lakes
DATA CATALOG
US Environment
Data Lake
Ranger
Atlas
DC
Profilers
Workload 11
Workload 12
Workload 13
…
Control Plane
Cloudera Managed
Compute Plane
Customer Managed
EU Environment
Data Lake
Ranger
Atlas
DC
Profilers
Workload 01
Workload 02
Workload 03
…
REPLICATION
MANAGER
WORKLOAD
MANAGER
IDENTITY
MANAGEMENT
DATA LAKE
MANAGEMENT
COMPETITION
© 2022 Cloudera, Inc. All rights reserved. 30
POINT SOLUTIONS HAVE AN INTEGRATION TAX
Security and governance is an afterthought
01 03
04
05
STREAMING &
DATA FLOW
DATA
ENGINEERING
DATA
WAREHOUSE
OPERATIONAL
DATABASE
MACHINE
LEARNING & AI
02
SECURITY | GOVERNANCE | CATALOG | METADATA | INTELLIGENCE
© 2022 Cloudera, Inc. All rights reserved. 31
POINT SOLUTIONS FROM AWS
Basic security, limited integration and zero automation
01 03
04
05
STREAMING &
DATA FLOW
DATA
ENGINEERING
DATA
WAREHOUSE
OPERATIONAL
DATABASE
MACHINE
LEARNING & AI
02
Kinesis
EMR
Redshift Sagemaker
DynamoDB
Glue
Lake
Formation
Lake
Formation
Lake
Formation
CloudTrail CloudWatch
CloudTrail CloudWatch
Lake
Formation
SECURITY | GOVERNANCE | CATALOG | METADATA | INTELLIGENCE
CONSISTENT SECURITY AND
GOVERNANCE
© 2022 Cloudera, Inc. All rights reserved. 34
SDX DELIVERS THE HYBRID DATA PLATFORM PROMISE
“Write once,
run anywhere”
data analytics
portability
DATA
ENGINEERING
DATA
WAREHOUSE
MACHINE
LEARNING
OPERATIONAL
DATABASE
DATA
FLOW
Unified security &
governance with
open cloud-native
storage formats
Open data fabrics,
lakehouses and data
meshes with data
anywhere at scale
Data
Lakehouse
Data
Fabric
Data
Mesh
Multi-cloud & on-
premises data
management
and analytics
© 2022 Cloudera, Inc. All rights reserved. 35
CDP with
vs OTHER
DATA LAKE
PLATFORMS
Easier - point-click-repeat,
automatically secure your data
lake without scripting
Faster - create a secure data
lake in minutes not weeks
Economical - allows for secure
& governed workloads and
services
Comprehensive - all functions
are covered, no compromises
Easy, fast to deploy Complex, slow to deploy Not supported
Category CDP Pre-CDP EMR HDI DataProc Category Definition
Identities Adding users and groups to multi-user clusters
Network - DNS Configure DNS for forward and reverse resolution
Network - Kerberos Configure Kerberos based authentication for multi-user clusters
Network - TLS Configure TLS for encryption on the wire
Network - Web Proxy Configure proxies for services' Web UIs
Storage encryption Data is encrypted at rest
SSO
LDAP based authentication & authorization for services' Web
UIs
Storage authorization Cloud storage access policies on multiple user clusters
Authorization Authorizing users on multi user clusters with tag based policies
Audit
Unified persistent audit and lineage across transient or
persistent and single or multi-user clusters
Lineage
Data stewardship Discover, curate and tag datasets with business context
Replication w/context Move data and workloads along with schema, tags, lineage
Workload optimizations Troubleshoot, optimize workloads for max utilization
Prereqs
to secure
a cluster
Critical
functions
needed
once a
cluster is
secured
© 2022 Cloudera, Inc. All rights reserved. 38
SDX: USE YOUR DATA AND USE IT PROPERLY
• Increased business agility
deploy consistently secure data
lakes faster
• Increased business insights
deploy more use cases faster
xxxx
• Increased governance capability
achieve and maintain regulatory
compliance with ease
• Decreased operational costs
one environment for all needs
xxxxx
• Decreased staff overhead
one set of controls for everything
xx
• Decreased security risks
comprehensive controls for multi-
user data access
© 2022 Cloudera, Inc. All rights reserved. 39
COMPLIANCE THROUGH CONSISTENT DATA CONTEXT
CONSUMER CREDIT
Strict enterprise data security, governance and
control to achieve regulatory compliance
PHARMACEUTICAL
Comply with stringent compliance and global
regulatory requirements
© 2022 Cloudera, Inc. All rights reserved. 40
CUSTOMER SUCCESS THROUGH SHARED DATA EXPERIENCE
“SDX is foundational in how we track, govern, and protect our data.”
- Navistar
THANK YOU

More Related Content

Similar to SDX Pitch Deck (201) - Apresentação SDP 2024

Hadoop security implementationon 20171003
Hadoop security implementationon 20171003Hadoop security implementationon 20171003
Hadoop security implementationon 20171003
lee tracie
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoop
Wei-Chiu Chuang
 
Dynamic Azure Credentials for Applications and CI/CD Pipelines
Dynamic Azure Credentials for Applications and CI/CD PipelinesDynamic Azure Credentials for Applications and CI/CD Pipelines
Dynamic Azure Credentials for Applications and CI/CD Pipelines
Mitchell Pronschinske
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solution
Cloudera, Inc.
 
Oracle Autonomous Data Warehouse Cloud and Data Visualization
Oracle Autonomous Data Warehouse Cloud and Data VisualizationOracle Autonomous Data Warehouse Cloud and Data Visualization
Oracle Autonomous Data Warehouse Cloud and Data Visualization
Edelweiss Kammermann
 
Slides: Enterprise Architecture vs. Data Architecture
Slides: Enterprise Architecture vs. Data ArchitectureSlides: Enterprise Architecture vs. Data Architecture
Slides: Enterprise Architecture vs. Data Architecture
DATAVERSITY
 
Lakehouse Analytics with Dremio
Lakehouse Analytics with DremioLakehouse Analytics with Dremio
Lakehouse Analytics with Dremio
DimitarMitov4
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
Cloudera, Inc.
 
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdfData & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
Chris Bingham
 
vVols and Your Cloud Operating Model with Tristan Todd
vVols and Your Cloud Operating Model with Tristan ToddvVols and Your Cloud Operating Model with Tristan Todd
vVols and Your Cloud Operating Model with Tristan Todd
Chris Williams
 
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloudA deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloud
Cloudera, Inc.
 
Meetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline DevelopmentMeetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline Development
Timothy Spann
 
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
ssuser73434e
 
Optimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analyticsOptimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analytics
Cloudera, Inc.
 
Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)
Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)
Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)
Manoj Kumar
 
Cloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemachtCloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera, Inc.
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
Informatica
 
Slides-Discover-Power-of-Live-Data(2).pdf
Slides-Discover-Power-of-Live-Data(2).pdfSlides-Discover-Power-of-Live-Data(2).pdf
Slides-Discover-Power-of-Live-Data(2).pdf
butthead7
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber Solution
Cloudera, Inc.
 
Vue d'ensemble Dremio
Vue d'ensemble DremioVue d'ensemble Dremio
Vue d'ensemble Dremio
Modern Data Stack France
 

Similar to SDX Pitch Deck (201) - Apresentação SDP 2024 (20)

Hadoop security implementationon 20171003
Hadoop security implementationon 20171003Hadoop security implementationon 20171003
Hadoop security implementationon 20171003
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoop
 
Dynamic Azure Credentials for Applications and CI/CD Pipelines
Dynamic Azure Credentials for Applications and CI/CD PipelinesDynamic Azure Credentials for Applications and CI/CD Pipelines
Dynamic Azure Credentials for Applications and CI/CD Pipelines
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solution
 
Oracle Autonomous Data Warehouse Cloud and Data Visualization
Oracle Autonomous Data Warehouse Cloud and Data VisualizationOracle Autonomous Data Warehouse Cloud and Data Visualization
Oracle Autonomous Data Warehouse Cloud and Data Visualization
 
Slides: Enterprise Architecture vs. Data Architecture
Slides: Enterprise Architecture vs. Data ArchitectureSlides: Enterprise Architecture vs. Data Architecture
Slides: Enterprise Architecture vs. Data Architecture
 
Lakehouse Analytics with Dremio
Lakehouse Analytics with DremioLakehouse Analytics with Dremio
Lakehouse Analytics with Dremio
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
 
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdfData & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
 
vVols and Your Cloud Operating Model with Tristan Todd
vVols and Your Cloud Operating Model with Tristan ToddvVols and Your Cloud Operating Model with Tristan Todd
vVols and Your Cloud Operating Model with Tristan Todd
 
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloudA deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloud
 
Meetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline DevelopmentMeetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline Development
 
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
 
Optimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analyticsOptimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analytics
 
Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)
Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)
Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)
 
Cloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemachtCloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemacht
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
 
Slides-Discover-Power-of-Live-Data(2).pdf
Slides-Discover-Power-of-Live-Data(2).pdfSlides-Discover-Power-of-Live-Data(2).pdf
Slides-Discover-Power-of-Live-Data(2).pdf
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber Solution
 
Vue d'ensemble Dremio
Vue d'ensemble DremioVue d'ensemble Dremio
Vue d'ensemble Dremio
 

Recently uploaded

Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 

Recently uploaded (20)

Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 

SDX Pitch Deck (201) - Apresentação SDP 2024

  • 2. © 2022 Cloudera, Inc. All rights reserved. 2 DATA MANAGEMENT IS SPREAD ALL OVER 47% 21% 24% 26% 32% On-premises Single cloud Multi cloud Hybrid cloud Private cloud Gartner recently warned that “Data and analytics leaders must prepare for the complexities of multi cloud and intercloud deployments to avoid potential performance issues… unplanned cost overruns and ... difficulties with integration efforts.” HBR June 2019
  • 3. © 2022 Cloudera, Inc. All rights reserved. 3 “You should be able to select the type of workload, select the capacity, run the job, and meet security requirements” - Kaushik Deka, Novantas partner and CTO
  • 4. © 2022 Cloudera, Inc. All rights reserved. 4 “Enterprise IT doesn’t operate at the speed of business. Your IT group needs to perform better than shadow IT.” CIO Magazine
  • 5. © 2022 Cloudera, Inc. All rights reserved. 5 STATUS QUO Shadow IT as part of overall IT budget spend Organizations with a hybrid or multi-cloud strategy Annual compliance cost in financial service firms 40% 84% $31M
  • 6. © 2022 Cloudera, Inc. All rights reserved. 6 CONVENTIONAL WISDOM: “SEPARATE STORAGE & COMPUTE” Monocluster Architecture Cloud-based Cluster Architecture • Agillity • Elasticity • Isolation • Transience Physical Servers COMPUTE STORAGE Virtual Machines COMPUTE Object Storage STORAGE
  • 7. © 2022 Cloudera, Inc. All rights reserved. 7 CONTEXT IS KEY BUT WHERE DOES IT GO? Monocluster Architecture Cloud-based Cluster Architecture* * Only works for simple use cases Metadata, metrics, audit, lineage permissions, filters, attributes Physical Servers COMPUTE CONTEXT STORAGE Virtual Machines COMPUTE Object Storage STORAGE • Single tenant • Single workload • Non-sensitive data • Non-regulated use case
  • 8. © 2022 Cloudera, Inc. All rights reserved. 8 DATA LAKES NEED SEPARATE COMPUTE, STORAGE & CONTEXT •Multi-user •Multi-workload • Sensitive data • Regulated use case Context is required for all but the simplest use cases • Long Running Apps •Discoverability •Trust •Compliance •Reuse •Sharing •Security Identities Schema Policy Profiling & Tags Replication Workloads Audits Encryption In order to provide:
  • 9. © 2022 Cloudera, Inc. All rights reserved. 9 NO CONTEXT, NO SOLUTION, EVEN IN NATIVE CLOUD Monocluster Architecture Cloud-based Cluster Architecture Physical Servers COMPUTE CONTEXT STORAGE Virtual Machines COMPUTE Object Storage STORAGE
  • 10. © 2022 Cloudera, Inc. All rights reserved. 10 CDP SEPARATES STORAGE, COMPUTE & CONTEXT Monocluster Architecture Cloud-based Cluster Architecture Cloud-native Hybrid Data Architecture Physical Servers COMPUTE CONTEXT STORAGE Virtual Machines COMPUTE Object Storage STORAGE Object Storage STORAGE Virtual Machines COMPUTE Always-on-Service CONTEXT
  • 12. © 2022 Cloudera, Inc. All rights reserved. 12 MANAGES AND SECURES THE ENTIRE DATA LIFECYCLE In any cloud or datacenter 01 03 04 05 STREAMING & DATA FLOW DATA ENGINEERING DATA WAREHOUSE OPERATIONAL DATABASE MACHINE LEARNING & AI 02 Collect Enrich Report Serve Predict SECURITY | GOVERNANCE | CATALOG | METADATA | INTELLIGENCE
  • 13. 13 © 2022 Cloudera, Inc. All rights reserved. Delivers the best of CDP Public and Private cloud - the power of “and” • Faster time to value and increased IT controls • High performance and low cost • Massive scale and bursty jobs • Made for data engineers and data scientists • Tools to build complex apps and simple dashboards CDP HYBRID DATA PLATFORM
  • 14. © 2022 Cloudera, Inc. All rights reserved. 14 SDX - CONSISTENT SECURITY AND GOVERNANCE Built for multi-functional analytics anywhere • Metadata: establish information assets for increased usability, trust and value leveraging all metadata (structural, operational, business and social) • Security: granular, dynamic, role- and attribute-based security policies. Prevent and audit unauthorized access to sensitive or restricted data across platform • Encryption: ultimate protection through automatic configuration of Kerberos backed authentication, and strong cryptography for data in motion and rest • Control: move data and workloads between deployments for optimum performance, cost and resilience, meeting ever changing business needs • Governance: enterprise-grade auditing, lineage, and governance capabilities applied across the platform with rich extensibility for partner integrations
  • 15. © 2022 Cloudera, Inc. All rights reserved. 15 DATA CATALOG: WINDOW ONTO SDX Discover data and audit usage • For data stewards and end-users • Discover, curate, tag fabric data • Establish trust through lineage and context through business glossary • Create and manage data access policies ● ABAC, RBAC; by file, table, column, row, etc. • Audit and identify what data a user has accessed
  • 17. © 2022 Cloudera, Inc. All rights reserved. 17 FAST DATA ONBOARDING • Examine data as it comes in ● Automatic data profiling • Discover data make-up ● Sensitive data recognition • Classify and characterize data ● Attribute based access control
  • 18. © 2022 Cloudera, Inc. All rights reserved. 18 MULTI-USER DATA ACCESS • Determine how data is seen ● Attribute based access control • Track (derived) data usage ● Data lineage • Sticky data characterization ● Classification propagation
  • 19. © 2022 Cloudera, Inc. All rights reserved. 19 SAFE DATA ACCESS EXPANSION • Classification propagation for derived data along lineage ● Attribute based access control remains applicable and enforced • Consistent security and governance ● All analytics, all deployments
  • 20. © 2022 Cloudera, Inc. All rights reserved. 20 BONUS ACHIEVEMENT • Good governance • Leads to compliance ● Industry regulation ● Data privacy regulations
  • 22. © 2022 Cloudera, Inc. All rights reserved. 22 BASED ON COMMUNITY OPEN SOURCE COMPONENTS Ensuring consistent security and governance Metastore - Schema - Security - Governance - Single Sign On
  • 23. © 2022 Cloudera, Inc. All rights reserved. 23 UNDER THE HOOD: SECURITY & METADATA User management Synchronize users and groups defined in corporate domain to CDP control plane through SAML Network security Kerberos, DNS, TLS enabled by default using a managed FreeIPA instance. Encryption Support for cloud storage native encryption at rest and in transit using TLS SSO Enable SSO access to all web UIs through Apache Knox Storage security Easily map corporate user identities to cloud IAM roles using IDBroker and Ranger’s upcoming RAZ Schema Manage structured metadata using Hive Metastore & Schema Registry Identities Schema Policy Audits
  • 24. © 2022 Cloudera, Inc. All rights reserved. 24 Authorization Restrict col / row / field access based on user, role, tag, or user attribute using Apache Ranger Governance Capture audit trail and lineage graphs (from ingest to model serving) using Apache Atlas Data stewardship Discover, manage, curate data assets with enhanced business metadata & workflows with Data Catalog Data profiling Automatically scan incoming data for sensitive fields and tag appropriately using Data Catalog Workload management Track workloads and jobs for easy troubleshooting, optimal performance using Workload Manager Replication Move data along with all context across hybrid and multi-clouds using Replication Manager UNDER THE HOOD: ACTIVE MANAGEMENT Data Profiling & Stewardship Replication Workload Management Encryption
  • 25. © 2022 Cloudera, Inc. All rights reserved. 25 SDX ARCHITECTURE IN CDP CONTEXT Hive Metastore - Schema Ranger - Security Atlas - Governance Knox - Single Sign On Environment Managed user directory, authentication & certificate services DW, ML, DE, … Self-Serve Experiences Persistent Disks Data Hub Virtual Private Clusters Relational Datastores Management Console Data Catalog Replication Manager Workload Manager Identity Management Data Lake Management
  • 26. © 2022 Cloudera, Inc. All rights reserved. 28 APAC Environment Data Lake Ranger Atlas DC Profilers Workload 21 Workload 22 Workload 23 … CDP Management Console DATA CATALOG ARCHITECTURE Global Data Catalog across multiple Data Lakes DATA CATALOG US Environment Data Lake Ranger Atlas DC Profilers Workload 11 Workload 12 Workload 13 … Control Plane Cloudera Managed Compute Plane Customer Managed EU Environment Data Lake Ranger Atlas DC Profilers Workload 01 Workload 02 Workload 03 … REPLICATION MANAGER WORKLOAD MANAGER IDENTITY MANAGEMENT DATA LAKE MANAGEMENT
  • 28. © 2022 Cloudera, Inc. All rights reserved. 30 POINT SOLUTIONS HAVE AN INTEGRATION TAX Security and governance is an afterthought 01 03 04 05 STREAMING & DATA FLOW DATA ENGINEERING DATA WAREHOUSE OPERATIONAL DATABASE MACHINE LEARNING & AI 02 SECURITY | GOVERNANCE | CATALOG | METADATA | INTELLIGENCE
  • 29. © 2022 Cloudera, Inc. All rights reserved. 31 POINT SOLUTIONS FROM AWS Basic security, limited integration and zero automation 01 03 04 05 STREAMING & DATA FLOW DATA ENGINEERING DATA WAREHOUSE OPERATIONAL DATABASE MACHINE LEARNING & AI 02 Kinesis EMR Redshift Sagemaker DynamoDB Glue Lake Formation Lake Formation Lake Formation CloudTrail CloudWatch CloudTrail CloudWatch Lake Formation SECURITY | GOVERNANCE | CATALOG | METADATA | INTELLIGENCE
  • 31. © 2022 Cloudera, Inc. All rights reserved. 34 SDX DELIVERS THE HYBRID DATA PLATFORM PROMISE “Write once, run anywhere” data analytics portability DATA ENGINEERING DATA WAREHOUSE MACHINE LEARNING OPERATIONAL DATABASE DATA FLOW Unified security & governance with open cloud-native storage formats Open data fabrics, lakehouses and data meshes with data anywhere at scale Data Lakehouse Data Fabric Data Mesh Multi-cloud & on- premises data management and analytics
  • 32. © 2022 Cloudera, Inc. All rights reserved. 35 CDP with vs OTHER DATA LAKE PLATFORMS Easier - point-click-repeat, automatically secure your data lake without scripting Faster - create a secure data lake in minutes not weeks Economical - allows for secure & governed workloads and services Comprehensive - all functions are covered, no compromises Easy, fast to deploy Complex, slow to deploy Not supported Category CDP Pre-CDP EMR HDI DataProc Category Definition Identities Adding users and groups to multi-user clusters Network - DNS Configure DNS for forward and reverse resolution Network - Kerberos Configure Kerberos based authentication for multi-user clusters Network - TLS Configure TLS for encryption on the wire Network - Web Proxy Configure proxies for services' Web UIs Storage encryption Data is encrypted at rest SSO LDAP based authentication & authorization for services' Web UIs Storage authorization Cloud storage access policies on multiple user clusters Authorization Authorizing users on multi user clusters with tag based policies Audit Unified persistent audit and lineage across transient or persistent and single or multi-user clusters Lineage Data stewardship Discover, curate and tag datasets with business context Replication w/context Move data and workloads along with schema, tags, lineage Workload optimizations Troubleshoot, optimize workloads for max utilization Prereqs to secure a cluster Critical functions needed once a cluster is secured
  • 33. © 2022 Cloudera, Inc. All rights reserved. 38 SDX: USE YOUR DATA AND USE IT PROPERLY • Increased business agility deploy consistently secure data lakes faster • Increased business insights deploy more use cases faster xxxx • Increased governance capability achieve and maintain regulatory compliance with ease • Decreased operational costs one environment for all needs xxxxx • Decreased staff overhead one set of controls for everything xx • Decreased security risks comprehensive controls for multi- user data access
  • 34. © 2022 Cloudera, Inc. All rights reserved. 39 COMPLIANCE THROUGH CONSISTENT DATA CONTEXT CONSUMER CREDIT Strict enterprise data security, governance and control to achieve regulatory compliance PHARMACEUTICAL Comply with stringent compliance and global regulatory requirements
  • 35. © 2022 Cloudera, Inc. All rights reserved. 40 CUSTOMER SUCCESS THROUGH SHARED DATA EXPERIENCE “SDX is foundational in how we track, govern, and protect our data.” - Navistar