Submit Search
Upload
Cloudera Enabling Native Integration of NoSQL HBase with Cloud Providers.pdf
•
0 likes
•
63 views
W
wchevreuil
Follow
Enabling Native Integration of NoSQL HBase with Cloud Providers
Read less
Read more
Software
Slideshow view
Report
Share
Slideshow view
Report
Share
1 of 33
Download now
Download to read offline
Recommended
Apache Hadoop 3
Apache Hadoop 3
Cloudera, Inc.
Redis Conf 2019--Container Attached Storage for Redis
Redis Conf 2019--Container Attached Storage for Redis
OpenEBS
How to Lower TCO and Avoid Cloud Lock-in
How to Lower TCO and Avoid Cloud Lock-in
Cloudera, Inc.
AWS CSA Associate 04-07
AWS CSA Associate 04-07
Heitor Vital
Azure DBA with IaaS
Azure DBA with IaaS
Kellyn Pot'Vin-Gorman
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
Wei-Chiu Chuang
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Cloudera, Inc.
Discover Aura Workshop (12.5.23).pdf
Discover Aura Workshop (12.5.23).pdf
Neo4j
Recommended
Apache Hadoop 3
Apache Hadoop 3
Cloudera, Inc.
Redis Conf 2019--Container Attached Storage for Redis
Redis Conf 2019--Container Attached Storage for Redis
OpenEBS
How to Lower TCO and Avoid Cloud Lock-in
How to Lower TCO and Avoid Cloud Lock-in
Cloudera, Inc.
AWS CSA Associate 04-07
AWS CSA Associate 04-07
Heitor Vital
Azure DBA with IaaS
Azure DBA with IaaS
Kellyn Pot'Vin-Gorman
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
Wei-Chiu Chuang
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Cloudera, Inc.
Discover Aura Workshop (12.5.23).pdf
Discover Aura Workshop (12.5.23).pdf
Neo4j
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
Hadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native Era
DataWorks Summit
Self-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft Azure
Cloudera, Inc.
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
Junping Du
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
DataWorks Summit
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Timothy Spann
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloudera, Inc.
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Cloudera, Inc.
IaaS for DBAs in Azure
IaaS for DBAs in Azure
Kellyn Pot'Vin-Gorman
Deep Dive on Backup.pdf
Deep Dive on Backup.pdf
Amazon Web Services
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
Cloudera, Inc.
Crossplane and a story about scaling Kubernetes custom resources.pdf
Crossplane and a story about scaling Kubernetes custom resources.pdf
Richárd Kovács
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
Newton Alex
Ozone - Evolution of hdfs scalability
Ozone - Evolution of hdfs scalability
Dinesh Chitlangia
Deep Dive on Backup
Deep Dive on Backup
Amazon Web Services
Azure storage deep dive
Azure storage deep dive
Sergio Navarro Pino
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18 asher bartch
Cloudera, Inc.
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
Altinity Ltd
DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...
DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...
inside-BigData.com
Cloud Architecture best practices
Cloud Architecture best practices
Omid Vahdaty
HBase System Tables / Metadata Info
HBase System Tables / Metadata Info
wchevreuil
HDFS client write/read implementation details
HDFS client write/read implementation details
wchevreuil
More Related Content
Similar to Cloudera Enabling Native Integration of NoSQL HBase with Cloud Providers.pdf
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
Hadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native Era
DataWorks Summit
Self-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft Azure
Cloudera, Inc.
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
Junping Du
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
DataWorks Summit
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Timothy Spann
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloudera, Inc.
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Cloudera, Inc.
IaaS for DBAs in Azure
IaaS for DBAs in Azure
Kellyn Pot'Vin-Gorman
Deep Dive on Backup.pdf
Deep Dive on Backup.pdf
Amazon Web Services
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
Cloudera, Inc.
Crossplane and a story about scaling Kubernetes custom resources.pdf
Crossplane and a story about scaling Kubernetes custom resources.pdf
Richárd Kovács
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
Newton Alex
Ozone - Evolution of hdfs scalability
Ozone - Evolution of hdfs scalability
Dinesh Chitlangia
Deep Dive on Backup
Deep Dive on Backup
Amazon Web Services
Azure storage deep dive
Azure storage deep dive
Sergio Navarro Pino
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18 asher bartch
Cloudera, Inc.
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
Altinity Ltd
DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...
DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...
inside-BigData.com
Cloud Architecture best practices
Cloud Architecture best practices
Omid Vahdaty
Similar to Cloudera Enabling Native Integration of NoSQL HBase with Cloud Providers.pdf
(20)
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
Hadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native Era
Self-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft Azure
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
IaaS for DBAs in Azure
IaaS for DBAs in Azure
Deep Dive on Backup.pdf
Deep Dive on Backup.pdf
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
Crossplane and a story about scaling Kubernetes custom resources.pdf
Crossplane and a story about scaling Kubernetes custom resources.pdf
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
Ozone - Evolution of hdfs scalability
Ozone - Evolution of hdfs scalability
Deep Dive on Backup
Deep Dive on Backup
Azure storage deep dive
Azure storage deep dive
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18 asher bartch
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...
DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...
Cloud Architecture best practices
Cloud Architecture best practices
More from wchevreuil
HBase System Tables / Metadata Info
HBase System Tables / Metadata Info
wchevreuil
HDFS client write/read implementation details
HDFS client write/read implementation details
wchevreuil
HBase RITs
HBase RITs
wchevreuil
HBase tales from the trenches
HBase tales from the trenches
wchevreuil
Hbasecon2019 hbck2 (1)
Hbasecon2019 hbck2 (1)
wchevreuil
Web hdfs and httpfs
Web hdfs and httpfs
wchevreuil
HBase replication
HBase replication
wchevreuil
Hadoop tuning
Hadoop tuning
wchevreuil
I nd t_bigdata(1)
I nd t_bigdata(1)
wchevreuil
Hadoop - TDC 2012
Hadoop - TDC 2012
wchevreuil
More from wchevreuil
(10)
HBase System Tables / Metadata Info
HBase System Tables / Metadata Info
HDFS client write/read implementation details
HDFS client write/read implementation details
HBase RITs
HBase RITs
HBase tales from the trenches
HBase tales from the trenches
Hbasecon2019 hbck2 (1)
Hbasecon2019 hbck2 (1)
Web hdfs and httpfs
Web hdfs and httpfs
HBase replication
HBase replication
Hadoop tuning
Hadoop tuning
I nd t_bigdata(1)
I nd t_bigdata(1)
Hadoop - TDC 2012
Hadoop - TDC 2012
Recently uploaded
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
kalichargn70th171
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Christina Lin
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
aagamshah0812
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
Wave PLM
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
kalichargn70th171
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
Evangelist Apps https://twitter.com/EvangelistSW/
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
OnePlan Solutions
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
Cionsystems
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
kellynguyen01
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
kaushalgiri8080
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
AxelRicardoTrocheRiq
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽❤️🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽❤️🧑🏻 89...
gurkirankumar98700
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
harshavardhanraghave
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
ICS
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
Arshad QA
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
VICTOR MAESTRE RAMIREZ
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
BradBedford3
What is Binary Language? Computer Number Systems
What is Binary Language? Computer Number Systems
JheuzeDellosa
Recently uploaded
(20)
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽❤️🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽❤️🧑🏻 89...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
What is Binary Language? Computer Number Systems
What is Binary Language? Computer Number Systems
Cloudera Enabling Native Integration of NoSQL HBase with Cloud Providers.pdf
1.
Enabling Native Integration
of NoSQL HBase with Cloud Providers Ankit Singhal, Wellington Chevreuil, Stephen Wu
2.
2 © 2023 Cloudera,
Inc. All rights reserved. ● TCO ○ Instance Sizing ○ Automatic scaling ○ Storage Optimization ● Storage Options ● Security Simplification ● Availability and Fault Tolerance ● Benchmark ● Cost Case study Agenda
3.
3 © 2023 Cloudera,
Inc. All rights reserved. Considering Cloud Native * https://www.redhat.com/en/topics/cloud-native-apps ● Elasticity with increased efficiency ● Flexible and configurable Availability ● Cost Reduction
4.
4 © 2023 Cloudera,
Inc. All rights reserved. Instance sizing ● Per role type: Masters, RegionServers ● Per load (with RS Grouping) ○ System tables (meta, namespace, phoenix) ○ Use case specific load TCO
5.
5 © 2023 Cloudera,
Inc. All rights reserved. Metrics based automatic scale up/down ● Read/Write latency/throughput based ● RPC latency based ● Compaction queue size ● Overall request load ● Available cache space (Cloud Storage deployments) TCO
6.
6 © 2023 Cloudera,
Inc. All rights reserved. Storage Optimizations ● Block vs Cloud Storage ● Ephemeral Storage for BucketCache TCO
7.
7 © 2023 Cloudera,
Inc. All rights reserved. Cloud Storage ● Benefits ○ Cost efficient for large volumes ○ Available on major cloud providers ○ Decoupled compute from storage ● Challenges ○ Increased latency ○ HBase native compatibility ■ S3 lacks atomic rename (for now) Storage Options
8.
8 © 2023 Cloudera,
Inc. All rights reserved. Cloud Storage ● Mitigating latency ○ File based bucket cache over ephemeral storage ■ Initial cache warmup required ■ Cache must be resilient to RS crashes/restarts ● HBASE-27686, HBASE-27743 ■ Region balancer must consider impacts to the cache ● HBASE-27389 ○ Cloud provider specific tunings ■ Throttling, connection/thread pooling configs ■ Request hints (random/sequential) Storage Options
9.
9 © 2023 Cloudera,
Inc. All rights reserved. Cloud Storage ● HBase S3 Integration ○ HBase originally designed for hierarchical file systems ■ HBase internal write operations are two phased: ● New files created on a temp dir ● Rename new files to final dir at commit time ■ Amazon S3 lacks atomic renames ● Requires locking the whole dir content ● Individual files rename calls to S3 (suboptimal) Storage Options
10.
10 © 2023 Cloudera,
Inc. All rights reserved. Cloud Storage Storage Options Original Flow
11.
11 © 2023 Cloudera,
Inc. All rights reserved. Cloud Storage ● HBase S3 Integration ○ Redesign HBase to not (only) rely on renames ■ Store File Tracking: HBASE-26067 ● Additional layer for tracking store files ● Write operations delegate file path decision to the tracking layer ● Configurable tracker implementations ○ Default: still uses renames ○ File Based Tracker: keeps list of valid files in meta files Storage Options
12.
12 © 2023 Cloudera,
Inc. All rights reserved. Cloud Storage SFT Flow Storage Options
13.
13 © 2023 Cloudera,
Inc. All rights reserved. Cloud Storage SFT Flow (detailed) Storage Options
14.
14 © 2023 Cloudera,
Inc. All rights reserved. Cloud Storage ● Limitations ○ No durable syncs (fsync/hflush) ■ WAL files require durable file sync to overcome crashes ■ Low latency critical for write performance ● Deploy HDFS for WAL ○ Separate WAL from data directory ○ WALs are temporary, so limited space usage ○ Use the instances attached disks for HDFS data Storage Options
15.
15 © 2023 Cloudera,
Inc. All rights reserved. Block Storage ● Benefits ○ Higher performance * ● Challenges ○ More costly ○ Compute/Storage tightly coupled Storage Options
16.
16 © 2023 Cloudera,
Inc. All rights reserved. Block Storage ● Block storage volumes attached to the cluster nodes ● HDFS deployed over the nodes ○ Both WALs and HBase data on HDFS ● Less cost efficient ■ Require additional/higher volumes attached to instances ■ HDFS replication factor requires more storage ■ Compute and Storage scaling together ● Higher Performance ○ Avg 5x better latency/throughput Storage Options
17.
17 © 2023 Cloudera,
Inc. All rights reserved. Cloud vs Block Storage Type Pros Cons Cloud Storage More cost efficient Separate compute from storage: ● Optimal scaling ● Data management and availability Lower performance ● Mitigated with bucket cache Block Storage Better overall performance Higher costs Couples compute and storage ● Can't scale independently Storage Options
18.
18 © 2023 Cloudera,
Inc. All rights reserved. Kerberos vs JWT ● Kerberos ○ HBase support since early versions ○ Strong client/server authentication (KDC based) ○ SASL for the RPC authentication ○ Requires clients to be known by the KDC ○ Complex to integrate with other services Security Simplification
19.
19 © 2023 Cloudera,
Inc. All rights reserved. Kerberos vs JWT ● JWT ○ Token based authentication ■ Custom SASL auth provider (HBASE-23347) ■ HBase builtin works ongoing (HBASE-26553) ○ TLS for RPC (encrypts the token and further info) ■ HBASE-26666 recently added TLS for RPC support ○ Centralised authentication service ■ Easily reusable by clients/other services Security Simplification
20.
20 © 2023 Cloudera,
Inc. All rights reserved. ● HBase built-in capabilities ○ Master HA ○ Table sharding (Regions) ○ Meta region replica Availability and Fault Tolerance
21.
21 © 2023 Cloudera,
Inc. All rights reserved. Availability and Fault Tolerance (cont.) ● * Multiple Availability Zones (Multi AZ) ○ Master instances on different zones ○ RSes spread evenly throughout zones *https://blog.cloudera.com/high-availability-multi-az-for-cdp-operational-database/
22.
22 © 2023 Cloudera,
Inc. All rights reserved. Availability and Fault Tolerance (cont.) ● Limitations ○ Cache reload (Cloud Storage deployments) when RSs crashes ■ HBASE-27313 Persist list of HFiles after prefetch ■ HBASE-27743 Enhancement for persistent cache ■ HBASE-27389 Cost function that considers bucket cache as part of the plan before region movement ○ Multi AZ increases latency (Block Storage deployments)
23.
23 © 2023 Cloudera,
Inc. All rights reserved. YCSB Workloads ● Dataset size: 20 Billion rows = ~20TB ● Workload C: 100% Read ● Workload A: 50% read, 50% write ● Workload F: 50% read, 25% update, 25% read-modify-update ● Two common deployments, HDFS vs Cloud storage Benchmark
24.
24 © 2023 Cloudera,
Inc. All rights reserved. Clusters definitions ● AWS ○ HDFS ■ 20 RSes m5.2xlarge storage with HDD ■ 2 Masters m5.8xlarge ○ S3 ■ 20 RSes i3.2xlarge storage as S3 ■ 2 Masters m5.2xlarge ● Azure ○ HDFS ■ 20 RSes Standard_D8_V3 ■ 2 Masters Standard_D32_V3 ○ ABFS ■ 20 RSes Standard_L8s_V2 ■ 2 Masters Standard_D8a_V4 Benchmark
25.
25 © 2023 Cloudera,
Inc. All rights reserved. AWS with HDFS vs S3 (Higher is better) Benchmark
26.
26 © 2023 Cloudera,
Inc. All rights reserved. AWS with HDFS vs S3 (Lower is better) Benchmark There is an outlier on P95+, where the P95 latency is 28.3 ms vs 3.3 ms
27.
27 © 2023 Cloudera,
Inc. All rights reserved. Azure with HDFS vs ABFS (Higher is better) Benchmark
28.
28 © 2023 Cloudera,
Inc. All rights reserved. Benchmark Azure with HDFS vs ABFS (Lower is better)
29.
29 © 2023 Cloudera,
Inc. All rights reserved. Average comparison against HDFS on Block Storage Workload Type Latency S3 vs HDFS Throughput S3 vs HDFS Latency ABFS vs HDFS Throughput S3 vs HDFS Read only workload 47% lower 86% higher 48% lower 91% higher 50% Read, 50% Write R: 52% lower W: 28% higher 36% higher R: 46% lower W: 35% lower 66% higher 50% read, 25% update, 25% read-modify-update R: 87% lower W: 29% lower 66% higher R: 48% lower W: 35% lower 66% lower Benchmark
30.
30 © 2023 Cloudera,
Inc. All rights reserved. Sample comparison for a 50TB read/write workload on AWS: 50TB read/write workload S3 with ephemeral HDFS in block storage Number of instances 43 43 Instance Types 3 x m5.2xlarge 40 x i3.2xlarge 3 x m5.8xlarge 40 x m5.2xlarge Initial capacity with performance parity 64TB ephemeral storage for bucket cache 190TB EBS volumes for HDFS Cost of instances $10,778.58 $7,467.51 Cost of storage *$1,428.60 $9,216.00 Total cost (monthly) $12,207.18 (26.8%) $16,683.51 Cost Case Study * we estimated the amount of list/put/get requests to S3 is 50 millions per month, the actual cost may be slightly higher
31.
31 © 2023 Cloudera,
Inc. All rights reserved. Q/A
32.
32 © 2023 Cloudera,
Inc. All rights reserved. ● Wellington Chevreuil ○ SW Engineer at Cloudera, HBase PMC ● Stephen Wu ○ SW Engineer at Cloudera, HBase PMC Speaker information
Download now