SlideShare a Scribd company logo
1 of 28
Download to read offline
Apache Ozone
Evolution of HDFS Scalability & built-in GDPR compliance
Hadoop,	Ozone	&	Apache	are	trademarks	of	the	Apache	Software	Foundation.
Dinesh Chitlangia, Cloudera
Ajay Kumar, Google
Agenda
• Why, When, What
• Notions, Architecture,
Deployment
• Ozone for Enterprise
Ozone
• Ozone – Delete Path
• Ozone & GDPR
GDPR
Q & A
HDFS scalability
limits
400M+
Future
Make your HDFS
healthy day
Why
Object Store for Big
Data
•Scale both Objects & IOPS
Set of Micro-services
- Divide, Conquer,
Scale
Seamless transition
for Yarn, MapReduce,
Hive, Spark apps.
Supports K8s, CSI and
ability to run on K8s
natively.
Ozone
Scale beyond HDFS
Large Data Store /
Dedicated Storage
Clusters
Cloud like presence
on-prem
First class citizen
on K8
When
Notions
Volumes ~
user accounts
Buckets ~
directories (no
sub-buckets)
Keys ~ files
HDDS Notions
Containers
[Collection of
Blocks]
Pipeline
Architecture
Ozone’s Microservices - Divide, Conquer, Scale
• Ozone Manager - namespace [~Namenodes]
• Storage Container Managers - blockspace [~BlockServer]
• Recon Server - Control Plane
• S3 Gateway
• Datanodes
Deployment
Variants
Ozone - Write Path
Similar to DFS Write, Blocks are written directly to Datanodes
Ozone - Read Path
Similar to DFS Read, Blocks are read directly from Datanodes
Using Ozone: Is it as painful as HDFS?
We hear you and we have to setup Ozone every time we test.
• Docker
• docker-compose up -d
• runs it on local machine
• K8s
• helm install ozone
• Traditional tarball
• Untar
• Run genconfig
• Update the configurations
• If you are familiar with HDFS commands
• dfs -ls hdfs://user
• with ozone, it will become
• dfs -ls o3fs://user
• If you are familiar with S3 commands like
• aws s3 ls -endpoint=us-west1. /bucketName
• with Ozone s3 it becomes
• aws s3 ls -endpoint=s3g.local. /bucketName
Setup Usage
Ozone for Enterprise
Scale
Consistency
Security
Ozone for Enterprise
• 10 Billion Keys will be supported in first official release
• Scale OM/SCM independently, without any disruption
• Evenly distribute metadata across the cluster including Datanodes
• RAFT Consensus Protocol via Apache RATIS
• Tested with industry recognized off-the-shelf components
• Blockade Tests - Tests to inject errors/failures in the clusters
• Tested Apache Spark, YARN, Hive workloads
• K8s based clusters, long running clusters, ephemeral clusters
• Freon - custom load generator
Ozone for Enterprise
Simplified Security
• Similar to HDFS, relies on Kerberos / Delegation Token / Block Token
• SCM comes with its own Certificate Authority and users DO NOT need to know
about it.
• Kerberos is only needed for OM/SCM, not for datanodes
• Security is on by default, not an afterthought
• Transparent Data Encryption
• Selectively audit READ or WRITE events, switch configs without the need to
restart.
Ozone for Enterprise
High Availability
• Built-in HA
• Single HA Configuration mode
• Regular HA Configuration mode [3 instances of OM/SCM]
ENFORCEMENTTRACKER.COM
British Airways £183.39M
Marriott International £100M
Swedish School for facial tracking
Dutch Hospital for unsecured patient
data
GENERAL DATA PROTECTION REGULATION (GDPR)
• Law for handling personal data
• Imposes responsibility on Data Controllers
• Enforces Accountability for Compliance
• Grants rights to Data Entity
• European Law: Spills outside of EU in Digital Era
STORAGE SYSTEMS & GDPR
Territorial Scope
Personal Data
Right to Erasure
(Right to be Forgotten)
Notification Obligatan
of the Controller
Delete Path - Overview
Delete Path – Under the hood
OZONE & GDPR
• GDPR Enabled Bucket
• During Ozone Key creation, generate Simple Encryption Key(SEK)
• Client writes data to blocks, encoded by SEK under the hood
• During read, the data is decoded using same SEK.
• During delete, OM moves the KeyInfo to Deleted Keys Section.
• SEK is irrevocable lost, Data cannot be decoded even if the actual blocks are
deleted much later
• Notification of Obligation is achieved
OZONE & GDPR -Limitations
• Backups & Restore
• Rapid Key Create/Delete cycles – false positives
• Existing Buckets need manual copy
• Network Topology
• HA Support
• Disk Scanner
• In-place upgrades for HDFS Clusters
• Erasure Coding
• Consistent Reads from Standby OM/SCM
• Stability & Scale testing
• TPC-DS, Chaos Monkey, Scale testing with Partners
Road ahead
Interested in Ozone?
https://hadoop.apache.org/ozone/
https://cwiki.apache.org/confluence/display/HADOOP/Ozone+Road+Map
Q & A
THANK YOU

More Related Content

What's hot

Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks
 
Data Lake: A simple introduction
Data Lake: A simple introductionData Lake: A simple introduction
Data Lake: A simple introductionIBM Analytics
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache RangerDataWorks Summit
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkBo Yang
 
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query PerformanceORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query PerformanceDataWorks Summit
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive QueriesOwen O'Malley
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizonThejas Nair
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks FundamentalsDalibor Wijas
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...DataWorks Summit/Hadoop Summit
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to sparkDuyhai Doan
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
 
Scaling Data Quality @ Netflix
Scaling Data Quality @ NetflixScaling Data Quality @ Netflix
Scaling Data Quality @ NetflixMichelle Ufford
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IODatabricks
 
Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshConfluentInc1
 

What's hot (20)

The delta architecture
The delta architectureThe delta architecture
The delta architecture
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Data Lake: A simple introduction
Data Lake: A simple introductionData Lake: A simple introduction
Data Lake: A simple introduction
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
 
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query PerformanceORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Scaling Data Quality @ Netflix
Scaling Data Quality @ NetflixScaling Data Quality @ Netflix
Scaling Data Quality @ Netflix
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
 
Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data Mesh
 

Similar to Ozone: Evolution of HDFS scalability & built-in GDPR compliance

Building a Hybrid Cloud Solution
Building a Hybrid Cloud Solution Building a Hybrid Cloud Solution
Building a Hybrid Cloud Solution Cloudian
 
Cloud - NDT - Presentation
Cloud - NDT - PresentationCloud - NDT - Presentation
Cloud - NDT - PresentationÉric Dusablon
 
Webinar: Cloud Storage: The 5 Reasons IT Can Do it Better
Webinar: Cloud Storage: The 5 Reasons IT Can Do it BetterWebinar: Cloud Storage: The 5 Reasons IT Can Do it Better
Webinar: Cloud Storage: The 5 Reasons IT Can Do it BetterStorage Switzerland
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Community
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS Tom Laszewski
 
Big Data Approaches to Cloud Security
Big Data Approaches to Cloud SecurityBig Data Approaches to Cloud Security
Big Data Approaches to Cloud SecurityPaul Morse
 
Managing storage on Prem and in Cloud
Managing storage on Prem and in CloudManaging storage on Prem and in Cloud
Managing storage on Prem and in CloudHoward Marks
 
Hadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneHadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneErik Krogen
 
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloudA1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloudDr. Wilfred Lin (Ph.D.)
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAlluxio, Inc.
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inRahulBhole12
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesKamesh Pemmaraju
 
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Red_Hat_Storage
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWSTom Laszewski
 

Similar to Ozone: Evolution of HDFS scalability & built-in GDPR compliance (20)

Building a Hybrid Cloud Solution
Building a Hybrid Cloud Solution Building a Hybrid Cloud Solution
Building a Hybrid Cloud Solution
 
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the EnterpriseDeploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
 
TenT-Day01.pptx
TenT-Day01.pptxTenT-Day01.pptx
TenT-Day01.pptx
 
TenT-Day01.pptx
TenT-Day01.pptxTenT-Day01.pptx
TenT-Day01.pptx
 
Cloud - NDT - Presentation
Cloud - NDT - PresentationCloud - NDT - Presentation
Cloud - NDT - Presentation
 
Webinar: Cloud Storage: The 5 Reasons IT Can Do it Better
Webinar: Cloud Storage: The 5 Reasons IT Can Do it BetterWebinar: Cloud Storage: The 5 Reasons IT Can Do it Better
Webinar: Cloud Storage: The 5 Reasons IT Can Do it Better
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
 
DAOS Middleware overview
DAOS Middleware overviewDAOS Middleware overview
DAOS Middleware overview
 
Big Data Approaches to Cloud Security
Big Data Approaches to Cloud SecurityBig Data Approaches to Cloud Security
Big Data Approaches to Cloud Security
 
Managing storage on Prem and in Cloud
Managing storage on Prem and in CloudManaging storage on Prem and in Cloud
Managing storage on Prem and in Cloud
 
Hadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneHadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of Ozone
 
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloudA1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
 
Iaas storage-170302090824
Iaas storage-170302090824Iaas storage-170302090824
Iaas storage-170302090824
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
 
Azure Databases with IaaS
Azure Databases with IaaSAzure Databases with IaaS
Azure Databases with IaaS
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation in
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference Architectures
 
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
 

Recently uploaded

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 

Recently uploaded (20)

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 

Ozone: Evolution of HDFS scalability & built-in GDPR compliance

  • 1. Apache Ozone Evolution of HDFS Scalability & built-in GDPR compliance Hadoop, Ozone & Apache are trademarks of the Apache Software Foundation. Dinesh Chitlangia, Cloudera Ajay Kumar, Google
  • 2. Agenda • Why, When, What • Notions, Architecture, Deployment • Ozone for Enterprise Ozone • Ozone – Delete Path • Ozone & GDPR GDPR Q & A
  • 4. Object Store for Big Data •Scale both Objects & IOPS Set of Micro-services - Divide, Conquer, Scale Seamless transition for Yarn, MapReduce, Hive, Spark apps. Supports K8s, CSI and ability to run on K8s natively. Ozone
  • 5. Scale beyond HDFS Large Data Store / Dedicated Storage Clusters Cloud like presence on-prem First class citizen on K8 When
  • 6. Notions Volumes ~ user accounts Buckets ~ directories (no sub-buckets) Keys ~ files HDDS Notions Containers [Collection of Blocks] Pipeline
  • 7. Architecture Ozone’s Microservices - Divide, Conquer, Scale • Ozone Manager - namespace [~Namenodes] • Storage Container Managers - blockspace [~BlockServer] • Recon Server - Control Plane • S3 Gateway • Datanodes
  • 8.
  • 10. Ozone - Write Path Similar to DFS Write, Blocks are written directly to Datanodes
  • 11. Ozone - Read Path Similar to DFS Read, Blocks are read directly from Datanodes
  • 12. Using Ozone: Is it as painful as HDFS? We hear you and we have to setup Ozone every time we test. • Docker • docker-compose up -d • runs it on local machine • K8s • helm install ozone • Traditional tarball • Untar • Run genconfig • Update the configurations • If you are familiar with HDFS commands • dfs -ls hdfs://user • with ozone, it will become • dfs -ls o3fs://user • If you are familiar with S3 commands like • aws s3 ls -endpoint=us-west1. /bucketName • with Ozone s3 it becomes • aws s3 ls -endpoint=s3g.local. /bucketName Setup Usage
  • 14. Ozone for Enterprise • 10 Billion Keys will be supported in first official release • Scale OM/SCM independently, without any disruption • Evenly distribute metadata across the cluster including Datanodes • RAFT Consensus Protocol via Apache RATIS • Tested with industry recognized off-the-shelf components • Blockade Tests - Tests to inject errors/failures in the clusters • Tested Apache Spark, YARN, Hive workloads • K8s based clusters, long running clusters, ephemeral clusters • Freon - custom load generator
  • 15. Ozone for Enterprise Simplified Security • Similar to HDFS, relies on Kerberos / Delegation Token / Block Token • SCM comes with its own Certificate Authority and users DO NOT need to know about it. • Kerberos is only needed for OM/SCM, not for datanodes • Security is on by default, not an afterthought • Transparent Data Encryption • Selectively audit READ or WRITE events, switch configs without the need to restart.
  • 16. Ozone for Enterprise High Availability • Built-in HA • Single HA Configuration mode • Regular HA Configuration mode [3 instances of OM/SCM]
  • 17. ENFORCEMENTTRACKER.COM British Airways £183.39M Marriott International £100M Swedish School for facial tracking Dutch Hospital for unsecured patient data
  • 18. GENERAL DATA PROTECTION REGULATION (GDPR) • Law for handling personal data • Imposes responsibility on Data Controllers • Enforces Accountability for Compliance • Grants rights to Data Entity • European Law: Spills outside of EU in Digital Era
  • 19. STORAGE SYSTEMS & GDPR Territorial Scope Personal Data Right to Erasure (Right to be Forgotten) Notification Obligatan of the Controller
  • 20. Delete Path - Overview
  • 21. Delete Path – Under the hood
  • 22.
  • 23.
  • 24. OZONE & GDPR • GDPR Enabled Bucket • During Ozone Key creation, generate Simple Encryption Key(SEK) • Client writes data to blocks, encoded by SEK under the hood • During read, the data is decoded using same SEK. • During delete, OM moves the KeyInfo to Deleted Keys Section. • SEK is irrevocable lost, Data cannot be decoded even if the actual blocks are deleted much later • Notification of Obligation is achieved
  • 25. OZONE & GDPR -Limitations • Backups & Restore • Rapid Key Create/Delete cycles – false positives • Existing Buckets need manual copy
  • 26. • Network Topology • HA Support • Disk Scanner • In-place upgrades for HDFS Clusters • Erasure Coding • Consistent Reads from Standby OM/SCM • Stability & Scale testing • TPC-DS, Chaos Monkey, Scale testing with Partners Road ahead
  • 28. Q & A THANK YOU