This talk was delivered at ApacheCON, Las Vegas USA, September 2019.
Audio Recording: https://feathercast.apache.org/2019/09/12/ozone-evolving-hdfs-scalability-to-new-heights-built-in-gdpr-compliance-dinesh-chitlangia/
Speakers:
Dinesh Chitlangia: https://www.linkedin.com/in/dineshchitlangia/
Ajay Kumar aka Ajay Yadav: https://www.linkedin.com/in/ajayydv/
Abstract:
https://www.apachecon.com/acna19/s/#/scheduledEvent/1176
Apache Hadoop Ozone is a robust, distributed key-value object store for Hadoop with layered architecture and strong consistency. It separates the namespace management from block and node management layer, which allows users to independently scale on both axes. Ozone is interoperable with Hadoop ecosystem as it provides OzoneFS (Hadoop compatible file system API), data locality and plug-n-play deployment with HDFS as it can be installed in an existing Hadoop cluster and can share storage disks with HDFS. Ozone solves the scalability challenges with HDFS by being size agnostic. Consequently, it allows users to store trillions of files in Ozone and access them as if they are on HDFS. Ozone plugs into existing Hadoop deployments seamlessly, and programs like Yarn, MapReduce, Spark, Hive and work without any modifications. In the era of increasing need for data privacy and regulations, Ozone also aims to provide built-in support for GDPR compliance with strong focus on Right to be Forgotten i.e., Data Erasure. At the end of this presentation the audience will be able to understand: 1. Overview of current challenges with HDFS scalability 2. How Ozone’s Architecture solves these challenges 3. Overview of GDPR 4. Built-in support for GDPR in Ozone
4. Object Store for Big
Data
•Scale both Objects & IOPS
Set of Micro-services
- Divide, Conquer,
Scale
Seamless transition
for Yarn, MapReduce,
Hive, Spark apps.
Supports K8s, CSI and
ability to run on K8s
natively.
Ozone
5. Scale beyond HDFS
Large Data Store /
Dedicated Storage
Clusters
Cloud like presence
on-prem
First class citizen
on K8
When
10. Ozone - Write Path
Similar to DFS Write, Blocks are written directly to Datanodes
11. Ozone - Read Path
Similar to DFS Read, Blocks are read directly from Datanodes
12. Using Ozone: Is it as painful as HDFS?
We hear you and we have to setup Ozone every time we test.
• Docker
• docker-compose up -d
• runs it on local machine
• K8s
• helm install ozone
• Traditional tarball
• Untar
• Run genconfig
• Update the configurations
• If you are familiar with HDFS commands
• dfs -ls hdfs://user
• with ozone, it will become
• dfs -ls o3fs://user
• If you are familiar with S3 commands like
• aws s3 ls -endpoint=us-west1. /bucketName
• with Ozone s3 it becomes
• aws s3 ls -endpoint=s3g.local. /bucketName
Setup Usage
14. Ozone for Enterprise
• 10 Billion Keys will be supported in first official release
• Scale OM/SCM independently, without any disruption
• Evenly distribute metadata across the cluster including Datanodes
• RAFT Consensus Protocol via Apache RATIS
• Tested with industry recognized off-the-shelf components
• Blockade Tests - Tests to inject errors/failures in the clusters
• Tested Apache Spark, YARN, Hive workloads
• K8s based clusters, long running clusters, ephemeral clusters
• Freon - custom load generator
15. Ozone for Enterprise
Simplified Security
• Similar to HDFS, relies on Kerberos / Delegation Token / Block Token
• SCM comes with its own Certificate Authority and users DO NOT need to know
about it.
• Kerberos is only needed for OM/SCM, not for datanodes
• Security is on by default, not an afterthought
• Transparent Data Encryption
• Selectively audit READ or WRITE events, switch configs without the need to
restart.
16. Ozone for Enterprise
High Availability
• Built-in HA
• Single HA Configuration mode
• Regular HA Configuration mode [3 instances of OM/SCM]
18. GENERAL DATA PROTECTION REGULATION (GDPR)
• Law for handling personal data
• Imposes responsibility on Data Controllers
• Enforces Accountability for Compliance
• Grants rights to Data Entity
• European Law: Spills outside of EU in Digital Era
19. STORAGE SYSTEMS & GDPR
Territorial Scope
Personal Data
Right to Erasure
(Right to be Forgotten)
Notification Obligatan
of the Controller
24. OZONE & GDPR
• GDPR Enabled Bucket
• During Ozone Key creation, generate Simple Encryption Key(SEK)
• Client writes data to blocks, encoded by SEK under the hood
• During read, the data is decoded using same SEK.
• During delete, OM moves the KeyInfo to Deleted Keys Section.
• SEK is irrevocable lost, Data cannot be decoded even if the actual blocks are
deleted much later
• Notification of Obligation is achieved