Ozone: Evolution of HDFS Scalability to trillions
of file system objects
Dinesh Chitlangia
2 © Cloudera, Inc. All rights reserved.
Credits
● Apache Hadoop community
● Cloudera
● ApacheCON Chicago
3 © Cloudera, Inc. All rights reserved.
Ozone - Why, When, What
Notions
Architecture
Deployment
Ozone - Write & Read path
Using Ozone
Ozone for Enterprise
Q & A
Agenda
4 © Cloudera, Inc. All rights reserved.
Why?
Challenges with HDFS
● Regular users ~ 200M files
● Heavy users ~ 400M+ files
● Make your HDFS healthy day
● Limited scalability of Namespace/Blockspace/Client/RPC
● Future
○ Cloud
○ Streaming
○ Small files are inevitable
5 © Cloudera, Inc. All rights reserved.
When?
When you need scale/cloud and HDFS stalls you.
● Scale - Files/Throughput
● Archival Store / Large Data Store / Dedicated Storage Clusters
● S3 is the new NFS
● Cloud like presence on-prem
● Cannot control small files
● Adopting K8 and need big data capable file system
6 © Cloudera, Inc. All rights reserved.
What?
Spiritual successor to HDFS
● Object Store for Big Data
● Set of Microservices - Divide, Conquer, Scale
● Scale both Objects & IOPS
● Namenode bottleneck is history
● Seamless transition for Yarn, MapReduce, Hive, Spark apps.
● Supports K8s, CSI and ability to run on K8s natively.
7 © Cloudera, Inc. All rights reserved.
Notions
A few fundamentals
● Volumes ~ user accounts
● Buckets ~ directories (no sub-buckets)
● Keys ~ files
● Volume can have many buckets
● Buckets can have many keys
● Key is composed of Blocks, Blocks are further divided into Chunks
● HDDS Notions
○ Containers [Collection of Blocks]
○ Pipeline
8 © Cloudera, Inc. All rights reserved.
Architecture
Ozone’s Microservices - Divide, Conquer, Scale
● Ozone Manager - namespace [~Namenodes]
● Storage Container Managers - blockspace [~BlockServer]
● Recon Server - Control Plane
● S3 Gateway
● Datanodes
9 © Cloudera, Inc. All rights reserved.
Architecture
The Big Picture
10 © Cloudera, Inc. All rights reserved.
Deployment
Variants
11 © Cloudera, Inc. All rights reserved.
Ozone - Write Path
Similar to DFS Write, Blocks are written directly to Datanodes
12 © Cloudera, Inc. All rights reserved.
Ozone - Read Path
Similar to DFS Read, Blocks are read directly from Datanodes
13 © Cloudera, Inc. All rights reserved.
Using Ozone: Is it as painful as HDFS?
We hear you and we have to setup Ozone every time we test.
● Docker
○ docker-compose up -d
○ runs it on local machine
● K8s
○ helm install ozone
● Traditional tarball
○ Untar
○ Run genconfig
○ Update the configurations
● If you are familiar with HDFS commands
○ dfs -ls hdfs://user
● with ozone, it will become
○ dfs -ls o3fs://user
● If you are familiar with S3 commands like
○ aws s3 ls -endpoint=us-west1. /bucketName
● with Ozone s3 it becomes
○ aws s3 ls -endpoint=s3g.local. /bucketName
Setup Usage
14 © Cloudera, Inc. All rights reserved.
Ozone for Enterprise
Designed for Scale
● 10 Billion Keys will be supported in first official release
● Partial Namespace in memory
● Off heap memory usage
● Scale OM/SCM independently, without any disruption
● Create large aggregations of metadata ~ Storage Containers
● Evenly distribute metadata across the cluster including Datanodes
15 © Cloudera, Inc. All rights reserved.
Ozone for Enterprise
Ensuring Correctness & Consistency
● RAFT Consensus Protocol via Apache RATIS
● RocksDB for metadata storage
● Tested with industry recognised off-the-shelf components
○ Blockade Tests - Tests to inject errors/failures in the clusters
○ Tested Apache Spark, YARN, Hive workloads
○ Real world workloads in Apache Spark
○ K8s based clusters, long running clusters, ephemeral clusters
○ S3AFileSystem & similar open source test suites to test S3 Gateway
○ Freon - custom load generator
16 © Cloudera, Inc. All rights reserved.
Ozone for Enterprise
Simplified Security
● Similar to HDFS, relies on Kerberos / Delegation Token / Block Token
● SCM comes with its own Certificate Authority and users DO NOT need to know
about it.
● Kerberos is only needed for OM/SCM, not for datanodes
● Security is on by default, not an afterthought
● Transparent Data Encryption
● Selectively audit READ or WRITE events, switch configs without the need to
restart.
17 © Cloudera, Inc. All rights reserved.
Ozone for Enterprise
High Availability
● Built-in HA
● Single HA Configuration mode
● Regular HA Configuration mode [3 instances of OM/SCM]
18 © Cloudera, Inc. All rights reserved.
Ozone for Enterprise
Road ahead
● Stability & Scale testing
○ TPC-DS, Chaos Monkey, Scale testing with Partners
● Network Topology
● HA Support
● Disk Scanner
● In-place upgrades for HDFS Clusters
● Erasure Coding
● GDPR Compliance
● Consistent Reads from Standby OM/SCM
● Apache Ranger - Ozone Plugin
19 © Cloudera, Inc. All rights reserved.
Ozone for Enterprise
References
https://hadoop.apache.org/ozone/
https://cwiki.apache.org/confluence/display/HADOOP/Ozone+Road+Map
© Cloudera, Inc. All rights reserved.
Q & A
THANK YOU

Ozone - Evolution of hdfs scalability

  • 1.
    Ozone: Evolution ofHDFS Scalability to trillions of file system objects Dinesh Chitlangia
  • 2.
    2 © Cloudera,Inc. All rights reserved. Credits ● Apache Hadoop community ● Cloudera ● ApacheCON Chicago
  • 3.
    3 © Cloudera,Inc. All rights reserved. Ozone - Why, When, What Notions Architecture Deployment Ozone - Write & Read path Using Ozone Ozone for Enterprise Q & A Agenda
  • 4.
    4 © Cloudera,Inc. All rights reserved. Why? Challenges with HDFS ● Regular users ~ 200M files ● Heavy users ~ 400M+ files ● Make your HDFS healthy day ● Limited scalability of Namespace/Blockspace/Client/RPC ● Future ○ Cloud ○ Streaming ○ Small files are inevitable
  • 5.
    5 © Cloudera,Inc. All rights reserved. When? When you need scale/cloud and HDFS stalls you. ● Scale - Files/Throughput ● Archival Store / Large Data Store / Dedicated Storage Clusters ● S3 is the new NFS ● Cloud like presence on-prem ● Cannot control small files ● Adopting K8 and need big data capable file system
  • 6.
    6 © Cloudera,Inc. All rights reserved. What? Spiritual successor to HDFS ● Object Store for Big Data ● Set of Microservices - Divide, Conquer, Scale ● Scale both Objects & IOPS ● Namenode bottleneck is history ● Seamless transition for Yarn, MapReduce, Hive, Spark apps. ● Supports K8s, CSI and ability to run on K8s natively.
  • 7.
    7 © Cloudera,Inc. All rights reserved. Notions A few fundamentals ● Volumes ~ user accounts ● Buckets ~ directories (no sub-buckets) ● Keys ~ files ● Volume can have many buckets ● Buckets can have many keys ● Key is composed of Blocks, Blocks are further divided into Chunks ● HDDS Notions ○ Containers [Collection of Blocks] ○ Pipeline
  • 8.
    8 © Cloudera,Inc. All rights reserved. Architecture Ozone’s Microservices - Divide, Conquer, Scale ● Ozone Manager - namespace [~Namenodes] ● Storage Container Managers - blockspace [~BlockServer] ● Recon Server - Control Plane ● S3 Gateway ● Datanodes
  • 9.
    9 © Cloudera,Inc. All rights reserved. Architecture The Big Picture
  • 10.
    10 © Cloudera,Inc. All rights reserved. Deployment Variants
  • 11.
    11 © Cloudera,Inc. All rights reserved. Ozone - Write Path Similar to DFS Write, Blocks are written directly to Datanodes
  • 12.
    12 © Cloudera,Inc. All rights reserved. Ozone - Read Path Similar to DFS Read, Blocks are read directly from Datanodes
  • 13.
    13 © Cloudera,Inc. All rights reserved. Using Ozone: Is it as painful as HDFS? We hear you and we have to setup Ozone every time we test. ● Docker ○ docker-compose up -d ○ runs it on local machine ● K8s ○ helm install ozone ● Traditional tarball ○ Untar ○ Run genconfig ○ Update the configurations ● If you are familiar with HDFS commands ○ dfs -ls hdfs://user ● with ozone, it will become ○ dfs -ls o3fs://user ● If you are familiar with S3 commands like ○ aws s3 ls -endpoint=us-west1. /bucketName ● with Ozone s3 it becomes ○ aws s3 ls -endpoint=s3g.local. /bucketName Setup Usage
  • 14.
    14 © Cloudera,Inc. All rights reserved. Ozone for Enterprise Designed for Scale ● 10 Billion Keys will be supported in first official release ● Partial Namespace in memory ● Off heap memory usage ● Scale OM/SCM independently, without any disruption ● Create large aggregations of metadata ~ Storage Containers ● Evenly distribute metadata across the cluster including Datanodes
  • 15.
    15 © Cloudera,Inc. All rights reserved. Ozone for Enterprise Ensuring Correctness & Consistency ● RAFT Consensus Protocol via Apache RATIS ● RocksDB for metadata storage ● Tested with industry recognised off-the-shelf components ○ Blockade Tests - Tests to inject errors/failures in the clusters ○ Tested Apache Spark, YARN, Hive workloads ○ Real world workloads in Apache Spark ○ K8s based clusters, long running clusters, ephemeral clusters ○ S3AFileSystem & similar open source test suites to test S3 Gateway ○ Freon - custom load generator
  • 16.
    16 © Cloudera,Inc. All rights reserved. Ozone for Enterprise Simplified Security ● Similar to HDFS, relies on Kerberos / Delegation Token / Block Token ● SCM comes with its own Certificate Authority and users DO NOT need to know about it. ● Kerberos is only needed for OM/SCM, not for datanodes ● Security is on by default, not an afterthought ● Transparent Data Encryption ● Selectively audit READ or WRITE events, switch configs without the need to restart.
  • 17.
    17 © Cloudera,Inc. All rights reserved. Ozone for Enterprise High Availability ● Built-in HA ● Single HA Configuration mode ● Regular HA Configuration mode [3 instances of OM/SCM]
  • 18.
    18 © Cloudera,Inc. All rights reserved. Ozone for Enterprise Road ahead ● Stability & Scale testing ○ TPC-DS, Chaos Monkey, Scale testing with Partners ● Network Topology ● HA Support ● Disk Scanner ● In-place upgrades for HDFS Clusters ● Erasure Coding ● GDPR Compliance ● Consistent Reads from Standby OM/SCM ● Apache Ranger - Ozone Plugin
  • 19.
    19 © Cloudera,Inc. All rights reserved. Ozone for Enterprise References https://hadoop.apache.org/ozone/ https://cwiki.apache.org/confluence/display/HADOOP/Ozone+Road+Map
  • 20.
    © Cloudera, Inc.All rights reserved. Q & A
  • 21.