Data Storage
Evolution in Uber
Jing Zhao, Uber
Data informs every decision at Uber
Marketplace
Pricing
Community
Operations
Growth Marketing Data Science
Compliance
Eats
Total Data Footprint
1.5+ EB
Presto®
Queries Daily
500K+
Apache Spark®
Apps
Daily
370K+
Uber’s Batch Data Stack
Apache Hadoop®/HDFS @ Uber
30
Clusters
2
Regions
1.5EB
Data Footprint
11K
Nodes
Scalability and Modernization
● HDFS Router-based Federation (2019 ~ 2020)
● Containerization and Automation (2020 ~ 2023)
HDFS Router-based Federation
● R/W routers + Read-only Routers
● Rolled out to Uber’s production
since 2019
● Greatly improved HDFS
scalability
● Distributing traffic to 30 HDFS
clusters
Containerization and Automation
● Containerized across data
plane and control plane
○ Including NameNode
with 300+ GB heap size
● Fully automated for cluster
management
○ Managing 11K nodes
○ NN + JN
Data Storage Efficiency
● Erasure Coding: reduce storage overhead (2020 ~ 2022)
● High-Density HDD: reduce storage unit cost (2022 ~2023)
HDFS Erasure Coding
HDFS Hot
Clusters
HDFS EC Clusters
(Hadoop 3)
HDFS
Router
Clients
(Hadoop 2.x)
EC Access Proxy
Data Transfer
Data
Correctness
Scanner
Replicated Data
Detector
Offline EC
Converter
RPC
● 50% storage saving with
Reed–Solomon(6, 3)
● EC access proxy
○ Seamless access for
Hadoop 2.x clients
○ Avoid Hadoop version
upgrade
● Capacity per Host: 4TB * 24 → 16TB * 35
● Efficiency: >50% HW cost reduction
● Challenges
○ DataNode IO performance
○ HDFS reliability (blast radius)
● Opportunities
○ Traffic focuses on a small group of
extremely hot blocks
○ Top 10K blocks attracted >90% read
traffic
Adopting High-Density HDD in HDFS
● Build a local cache within DataNode
○ 4TB NVMe SSD disk
○ Based on DataNode local traffic
● Leverage Alluxio for cache management
○ Page-level cache
○ 1MB default page size matches traffic
pattern
DataNode Local Cache
Cloud Migration
2023 ~ Present
● Replacing HDFS with Cloud Object Storages
● Hybrid Cloud and Multi-Cloud Architectures
● Migrating Batch Data
Processing Stack to Google®
Cloud Platform (GCP)
● Replace HDFS with Google®
Cloud Storage (GCS)
● Logical namespace to abstract
out internal bucket layout
● Performance optimizations
Cloud Object Storage
Perf/Func Optimizations
IO capacity limits Traffic balancing and bucket pre-splits
Write throughput GVNIC adoption for aggregated throughput improvement: 20 Gbps → 32 Gbps
Parallel composite uploads for single writer throughput improvement
Read/Listing latenties gRPC APIs for better performance consistency
Presto: local SSD cache
Hive/Spark parallel listing for partitioning data
Hudi: the performance improvements with 0.14 features
Rename Failure handling and Python library enhancement
Spark optimized file output committer
Performance optimizations
Hybrid Cloud Architecture (WIP)
● One logical DataLake on unified data
storage
○ Across on-prem HDFS and Cloud
object storage
○ Logical paths to abstract out internal
details
● Optimizations for
○ Ingress/Egress traffic cost
○ Data storage cost
Tables and Blobs: Unified Multi-Cloud Storage (Future)
● Tables and Blobs
● Multi-Cloud architecture
○ Google Cloud Platform (GCP)
○ Oracle® Cloud Infrastructure
(OCI)
● Data orchestration and caching
"Apache®, Apache Hadoop®, Hadoop®, and Apache Spark® are either registered trademarks or trademarks of the Apache
Software Foundation® in the United States and/or other countries. No endorsement by The Apache Software Foundation® is
implied by the use of these marks."
"Google®, Google Cloud Platform®, and Google Cloud Storage® are either registered trademarks or trademarks of Google LLC in
the United States and/or other countries. No endorsement by Google LLC is implied by the use of these marks."
"Oracle® is a registered trademarks of Oracle Corporation. No endorsement by Oracle Corporation is implied by the use of the
mark."
"Presto® is a registered trademark of LF Projects, LLC. No endorsement by LF Projects, LLC is implied by the use of the mark."

Data Infra Meetup | Uber's Data Storage Evolution

  • 1.
    Data Storage Evolution inUber Jing Zhao, Uber
  • 2.
    Data informs everydecision at Uber Marketplace Pricing Community Operations Growth Marketing Data Science Compliance Eats
  • 3.
    Total Data Footprint 1.5+EB Presto® Queries Daily 500K+ Apache Spark® Apps Daily 370K+ Uber’s Batch Data Stack
  • 4.
    Apache Hadoop®/HDFS @Uber 30 Clusters 2 Regions 1.5EB Data Footprint 11K Nodes
  • 5.
    Scalability and Modernization ●HDFS Router-based Federation (2019 ~ 2020) ● Containerization and Automation (2020 ~ 2023)
  • 6.
    HDFS Router-based Federation ●R/W routers + Read-only Routers ● Rolled out to Uber’s production since 2019 ● Greatly improved HDFS scalability ● Distributing traffic to 30 HDFS clusters
  • 7.
    Containerization and Automation ●Containerized across data plane and control plane ○ Including NameNode with 300+ GB heap size ● Fully automated for cluster management ○ Managing 11K nodes ○ NN + JN
  • 8.
    Data Storage Efficiency ●Erasure Coding: reduce storage overhead (2020 ~ 2022) ● High-Density HDD: reduce storage unit cost (2022 ~2023)
  • 9.
    HDFS Erasure Coding HDFSHot Clusters HDFS EC Clusters (Hadoop 3) HDFS Router Clients (Hadoop 2.x) EC Access Proxy Data Transfer Data Correctness Scanner Replicated Data Detector Offline EC Converter RPC ● 50% storage saving with Reed–Solomon(6, 3) ● EC access proxy ○ Seamless access for Hadoop 2.x clients ○ Avoid Hadoop version upgrade
  • 10.
    ● Capacity perHost: 4TB * 24 → 16TB * 35 ● Efficiency: >50% HW cost reduction ● Challenges ○ DataNode IO performance ○ HDFS reliability (blast radius) ● Opportunities ○ Traffic focuses on a small group of extremely hot blocks ○ Top 10K blocks attracted >90% read traffic Adopting High-Density HDD in HDFS
  • 11.
    ● Build alocal cache within DataNode ○ 4TB NVMe SSD disk ○ Based on DataNode local traffic ● Leverage Alluxio for cache management ○ Page-level cache ○ 1MB default page size matches traffic pattern DataNode Local Cache
  • 12.
    Cloud Migration 2023 ~Present ● Replacing HDFS with Cloud Object Storages ● Hybrid Cloud and Multi-Cloud Architectures
  • 13.
    ● Migrating BatchData Processing Stack to Google® Cloud Platform (GCP) ● Replace HDFS with Google® Cloud Storage (GCS) ● Logical namespace to abstract out internal bucket layout ● Performance optimizations Cloud Object Storage
  • 14.
    Perf/Func Optimizations IO capacitylimits Traffic balancing and bucket pre-splits Write throughput GVNIC adoption for aggregated throughput improvement: 20 Gbps → 32 Gbps Parallel composite uploads for single writer throughput improvement Read/Listing latenties gRPC APIs for better performance consistency Presto: local SSD cache Hive/Spark parallel listing for partitioning data Hudi: the performance improvements with 0.14 features Rename Failure handling and Python library enhancement Spark optimized file output committer Performance optimizations
  • 15.
    Hybrid Cloud Architecture(WIP) ● One logical DataLake on unified data storage ○ Across on-prem HDFS and Cloud object storage ○ Logical paths to abstract out internal details ● Optimizations for ○ Ingress/Egress traffic cost ○ Data storage cost
  • 16.
    Tables and Blobs:Unified Multi-Cloud Storage (Future) ● Tables and Blobs ● Multi-Cloud architecture ○ Google Cloud Platform (GCP) ○ Oracle® Cloud Infrastructure (OCI) ● Data orchestration and caching
  • 17.
    "Apache®, Apache Hadoop®,Hadoop®, and Apache Spark® are either registered trademarks or trademarks of the Apache Software Foundation® in the United States and/or other countries. No endorsement by The Apache Software Foundation® is implied by the use of these marks." "Google®, Google Cloud Platform®, and Google Cloud Storage® are either registered trademarks or trademarks of Google LLC in the United States and/or other countries. No endorsement by Google LLC is implied by the use of these marks." "Oracle® is a registered trademarks of Oracle Corporation. No endorsement by Oracle Corporation is implied by the use of the mark." "Presto® is a registered trademark of LF Projects, LLC. No endorsement by LF Projects, LLC is implied by the use of the mark."