Alluxio Confidential
Alluxio + S3:
A Tiered Architecture for Latency-Critical,
Semantically-Rich Workloads
Jingwen Ouyang
Senior Product Manager @Alluxio
Oct 28th, 2025
2
Challenge In The New Era
$
Data… Lots of data… Lots of fast Data!
400+ trillion objects stored
150+ million requests per second
11 nines durability (99.999999999%)
~$23/TB/month cost-effective pricing
But Modern Workloads Demand More...
As AI training, inference, and real-time analytics evolve, S3 or Object Storage in general shows strain with latency-critical and
semantically rich operations.
Amazon S3 Becomes the Hard Disk for Cloud
What an Architect is asking for today:
- Sub-millisecond SLAs for online queries, feature stores, agentic memory, etc
- Efficient write-ahead logs and checkpointing for large objects
- High-performance metadata operations across millions of objects
- All while maintaining S3's pricing, scalability, and operational simplicity
Modern AI & Real Time Workload Requirements
- Latency: GetObject TTFB typically 30-200ms — acceptable for batch,
painful for inference and real-time access
- Limited Semantics: Rename operations require copy + delete; append
writes not supported in Standard S3
- Metadata Operations: Standard S3 directories are prefixes, making
large-scale listing expensive and slow
Where S3 Shows Strain
The Reality: S3 excels as a capacity store, but real-time and
latency-critical workloads requiring sub-millisecond response times.
Add a transparent, distributed caching and augmentation layer on top of S3,
combining the best of multiple worlds:
- Mountable experience of AWS FSx for Lustre
- Ultra-low latency of AWS S3 Express One Zone
- Cost efficiency of standard S3 buckets
- No data migration required
Our Solution: Augment, Don't Replace
7
What is Alluxio, indeed?
$
A shim layer on S3 (or other cloud storage) to
provide sub-ms read latency, single-digit ms
write latency, and enhanced semantics, driven
by morden data-intensive workloads
Journey of Alluxio Since Inception
Alluxio open source
project founded
UC Berkeley AMPLab
2019 2023
Baidu deploys
1000+ node cluster
2014
Alluxio scales to
1 billion files
7/10 top internet brands
accelerated by Alluxio
AliPay accelerates
model training
BIG DATA ANALYTICS CLOUD ADOPTION GENERATIVE AI
1000+ OSS
Contributors
Meta accelerates
Presto workloads
9/10 top internet brands
accelerated by Alluxio
2024
Alluxio scales to
10+ billion files
Leading ecommerce brand
accelerates model training
Fortune 5 brand
accelerates model training
Zhihu accelerates
LLM model training
9
Alluxio in AI & Analytics Ecosystem
S3, POSIX, FSSPEC, HDFS s3://bucket/text.txt
or
/S3/text.txt
Alluxio for Low Latency Caching
Alluxio is the industry-leading
sub-ms time to first byte (TTFB) solution on S3-class storage
How much better is Alluxio? (Details next slide)
➔ 45x Lower Latency than S3 Standard
➔ 5x Lower Latency than S3 Express One Zone
➔ Unlimited, linear scalability
Alluxio for Low Latency Caching
➔ 45x Lower Latency than S3 Standard
➔ 5x Lower Latency than S3 Express One Zone
Test environment references
Alluxio EE
● Version/Spec: Alluxio Enterprise
AI 3.6 (50TB cache)
● Test env: 1 FUSE (C5n.metal,
100Gbps network) and 1
Worker (i3en.metal)
AWS S3
● Version/Spec: AWS S3 bucket
(Standard Class)
● Test env: 1 FUSE (C5n.metal,
100Gbps network)
AWS S3 Express One Zone
● Version/Spec: AWS bucket (S3
Express One Zone Class)
● Test env: 1 FUSE (C5n.metal,
100Gbps network)
Alluxio
Worker n
Alluxio
Worker 2
Big Data Query
Big Data ETL Model Training
Core Feature 1: Distributed Caching
Alluxio
Worker 1
A
B
s3:/bucket/file1
s3://bucket/file2
C
A C B
Worker selection based
on consistent hashing
● Fine grained chucks
● Can cache the more important ports
s3://bucket/file2
Core Feature 2: Filesystem Namespace Virtualization
● Alluxio can be viewed as a logical file system
○ Multiple different storage service can be mounted into same logical Alluxio namespace
● An Alluxio path is backed by an persistent storage address
○ alluxio://ip:port/bucket/Users/ <-> S3://bucket/Users
● Easy mount command
Alluxio Namespace
AWS us-east-1
/
Data Users
Alice Bob
s3://bucket/Users
Alice Bob
On-prem data warehouse
hdfs://service/salesdata
Reports Sales
Reports Sales
$ bin/alluxio mount add 
--path /s3/bucket/Users 
--ufs-uri s3://bucket/Users
Common theme:
● Use Apache Parquet format for fast
point-query lookup into structured data
○ Industry standard today for data lake
● Store Parquet files of PB level on S3
● Read directly from S3 is bad in tail
latency
Low Latency Read Accelerator on S3
AWS
SQL/Pandas/Polars
Data Lake
Distributed Cache
~1 ms
30 ms - 200 ms
Common theme:
● Can be overwrite or append
● Either keeps replication in Alluxio space
or asynchronously upload to S3
Low-latency & “Reliable” Write Buffer on S3
AWS
Rocksdb/S3 Client
Data Lake
Distributed Cache
~5 ms Append
Upload in
background
Alluxio: Bringing Performance and Semantics to S3
A software layer that transparently sits between applications and S3 (or any object store)
offering both POSIX and S3-compatible APIs.
Benefits (on top of S3) Capability
Zero-migration Mount existing S3 buckets as-is; no data move required
Low-latency accelerator Achieves sub-ms latency for S3 objects
Semantic bridge Allows user to use S3 or POSIX
Minimal-hardware requirement Pool local SSDs/NVMEs for intelligent, cost-efficient caching
Flexible write modes Enable append, async writes, and cache-only updates
Kubernetes-native Deploy via Operator; integrated metrics, tracing, and observability
Alternatives on AWS: Side-by-Side Comparison
Feature S3 Standard S3 Express One
Zone
FSx Lustre + S3 Alluxio + S3
Latency (TTFB) 100+ ms 1–10 ms 1 ms 1 ms
Multi-cloud ❌ ❌ ❌ ✅
POSIX API ❌ ❌ ✅ ✅
S3 API ✅ ✅ ❌ ✅
Support WALs
(Append)
❌ ✅ ✅ ✅ (via POSIX)
Data Migration
Required
No High (Creation
time choice)
No No
Cost ($/TB/mo) ~$23 1
~$110 2
~$143 3
~$23 4
to ~$41 5
1
Assumes S3 standard is the source of
truth, hoping full data
2
Assumes S3 Express One Zone holding full
data, as it needs to be decided at bucket
creation time
3
Assumes for 1,000 MB/s/TiB class, FSx
Lustre holding 20% hot data, while S3
keeps full data
4
Assumes Alluxio deployed on GPU spare
disks holding 20% hot data, no additional
hardware cost, while S3 keeps full data
5
Assumes separate Alluxio cluster holding
20% hot data using i3en.6xlarge instances
(1 yr reserved), while S3 keeps full data
18
Real World Use Cases
$
● Blackout Power Trading
● Salesforce
30 minutes between last forecast and market close
= 15 minutes to run inference w/ thousands of small ML models
+ 15 minutes to review, manually adjust for risk profiles and
human insights, and submit
Market Window Pressure Cooker
S3 Becoming Inefficient
● Currently 5k models (~6/sec) -> growing to 100k (~110/sec)
● Query averaged 3.8 seconds for inference
● Reading models artifact and saving results 3+ seconds
● ~80% of time waiting on IO
CASE STUDY 1: Blackout Power Trading
Achieving Double-Digit Millisecond Offline Feature Stores with Alluxio
Query Type* Without Alluxio With Alluxio
Query (ms) Cold Query (ms) Hot Query (ms)
Inference (24 rows) 3,727 99 45
Training (70k rows) 3,841 171 104
Performance
Operational Payoff
● ~60x reduction in latency for inference
● ~30× reduction in latency for training
● Scaling from 5,000 to 100,000+ models in the same 15-minute window
● No online feature store necessary
● Low latency training data
* 20 table join, 4 columns per table, 1 primary key, 81 column result
CASE STUDY 1: Blackout Power Trading
Achieving Double-Digit Millisecond Offline Feature Stores with Alluxio
CASE STUDY 1: Blackout Power Trading
Achieving Double-Digit Millisecond Offline Feature Stores with Alluxio
Talk from Greg Lindstrom @ Blackout Power Trading
https://www.alluxio.io/videos/ai-ml-infra-meetup-achieving-double-digit-millisecond-offline-feature-stores-with-alluxio
CASE STUDY 2: Salesforce
Ultra Low Latency Access for Parquet on S3
22
Before
50 ms -
200 ms
Agents
Iceberg
Data Lake
Challenges: requires ultra-low latency
data accesses to PBs of parquet files for
Agentforce.
● S3 P99 latency is ~100 ms
● Data lake is huge (PB+)
● Agents make multiple (10 to 100+)
fine-grained requests to the data
lake.
● Same function needed across 50
AZs
~1 ms
Query Offloading
Agents
Iceberg
Data Lake
After
CASE STUDY 2: Salesforce
Ultra Low Latency Access for Parquet on S3
23
With query offloading (like single RPC to a
parquet reader), the request is optimized, and
much of the filtering/logic happens closer to the
data, reducing chatter and latency.
Result: 411 ms → 0.3 ms
CASE STUDY 2: Salesforce
Ultra Low Latency Access for Parquet on S3
24
A joint engineering collaboration between Alluxio and Salesforce. For More details:
https://www.alluxio.io/whitepaper/meet-in-the-middle-for-a-1-000x-performance-boost-querying-parquet-files
-on-petabyte-scale-data-lakes
25
Takeaway
$
● S3 remains essential but wasn't designed for latency-critical, semantically rich
workloads
● Don't need to migrate away from S3 — augment it with intelligent caching and
performance layers
● Alluxio bridges the gap between S3's cost-effectiveness and the performance
demands of modern AI/ML workloads
● Proven results: Sub-millisecond latency, and real-world production deployments
● Maximize ROI: Get premium performance without premium costs or operational
complexity
BOOK A DEMO WITH US

Alluxio Webinar | Alluxio + S3 A Tiered Architecture for Latency-Critical, Semantically-Rich Workloads

  • 1.
    Alluxio Confidential Alluxio +S3: A Tiered Architecture for Latency-Critical, Semantically-Rich Workloads Jingwen Ouyang Senior Product Manager @Alluxio Oct 28th, 2025
  • 2.
    2 Challenge In TheNew Era $ Data… Lots of data… Lots of fast Data!
  • 3.
    400+ trillion objectsstored 150+ million requests per second 11 nines durability (99.999999999%) ~$23/TB/month cost-effective pricing But Modern Workloads Demand More... As AI training, inference, and real-time analytics evolve, S3 or Object Storage in general shows strain with latency-critical and semantically rich operations. Amazon S3 Becomes the Hard Disk for Cloud
  • 4.
    What an Architectis asking for today: - Sub-millisecond SLAs for online queries, feature stores, agentic memory, etc - Efficient write-ahead logs and checkpointing for large objects - High-performance metadata operations across millions of objects - All while maintaining S3's pricing, scalability, and operational simplicity Modern AI & Real Time Workload Requirements
  • 5.
    - Latency: GetObjectTTFB typically 30-200ms — acceptable for batch, painful for inference and real-time access - Limited Semantics: Rename operations require copy + delete; append writes not supported in Standard S3 - Metadata Operations: Standard S3 directories are prefixes, making large-scale listing expensive and slow Where S3 Shows Strain The Reality: S3 excels as a capacity store, but real-time and latency-critical workloads requiring sub-millisecond response times.
  • 6.
    Add a transparent,distributed caching and augmentation layer on top of S3, combining the best of multiple worlds: - Mountable experience of AWS FSx for Lustre - Ultra-low latency of AWS S3 Express One Zone - Cost efficiency of standard S3 buckets - No data migration required Our Solution: Augment, Don't Replace
  • 7.
    7 What is Alluxio,indeed? $ A shim layer on S3 (or other cloud storage) to provide sub-ms read latency, single-digit ms write latency, and enhanced semantics, driven by morden data-intensive workloads
  • 8.
    Journey of AlluxioSince Inception Alluxio open source project founded UC Berkeley AMPLab 2019 2023 Baidu deploys 1000+ node cluster 2014 Alluxio scales to 1 billion files 7/10 top internet brands accelerated by Alluxio AliPay accelerates model training BIG DATA ANALYTICS CLOUD ADOPTION GENERATIVE AI 1000+ OSS Contributors Meta accelerates Presto workloads 9/10 top internet brands accelerated by Alluxio 2024 Alluxio scales to 10+ billion files Leading ecommerce brand accelerates model training Fortune 5 brand accelerates model training Zhihu accelerates LLM model training
  • 9.
    9 Alluxio in AI& Analytics Ecosystem S3, POSIX, FSSPEC, HDFS s3://bucket/text.txt or /S3/text.txt
  • 10.
    Alluxio for LowLatency Caching Alluxio is the industry-leading sub-ms time to first byte (TTFB) solution on S3-class storage How much better is Alluxio? (Details next slide) ➔ 45x Lower Latency than S3 Standard ➔ 5x Lower Latency than S3 Express One Zone ➔ Unlimited, linear scalability
  • 11.
    Alluxio for LowLatency Caching ➔ 45x Lower Latency than S3 Standard ➔ 5x Lower Latency than S3 Express One Zone Test environment references Alluxio EE ● Version/Spec: Alluxio Enterprise AI 3.6 (50TB cache) ● Test env: 1 FUSE (C5n.metal, 100Gbps network) and 1 Worker (i3en.metal) AWS S3 ● Version/Spec: AWS S3 bucket (Standard Class) ● Test env: 1 FUSE (C5n.metal, 100Gbps network) AWS S3 Express One Zone ● Version/Spec: AWS bucket (S3 Express One Zone Class) ● Test env: 1 FUSE (C5n.metal, 100Gbps network)
  • 12.
    Alluxio Worker n Alluxio Worker 2 BigData Query Big Data ETL Model Training Core Feature 1: Distributed Caching Alluxio Worker 1 A B s3:/bucket/file1 s3://bucket/file2 C A C B Worker selection based on consistent hashing ● Fine grained chucks ● Can cache the more important ports s3://bucket/file2
  • 13.
    Core Feature 2:Filesystem Namespace Virtualization ● Alluxio can be viewed as a logical file system ○ Multiple different storage service can be mounted into same logical Alluxio namespace ● An Alluxio path is backed by an persistent storage address ○ alluxio://ip:port/bucket/Users/ <-> S3://bucket/Users ● Easy mount command Alluxio Namespace AWS us-east-1 / Data Users Alice Bob s3://bucket/Users Alice Bob On-prem data warehouse hdfs://service/salesdata Reports Sales Reports Sales $ bin/alluxio mount add --path /s3/bucket/Users --ufs-uri s3://bucket/Users
  • 14.
    Common theme: ● UseApache Parquet format for fast point-query lookup into structured data ○ Industry standard today for data lake ● Store Parquet files of PB level on S3 ● Read directly from S3 is bad in tail latency Low Latency Read Accelerator on S3 AWS SQL/Pandas/Polars Data Lake Distributed Cache ~1 ms 30 ms - 200 ms
  • 15.
    Common theme: ● Canbe overwrite or append ● Either keeps replication in Alluxio space or asynchronously upload to S3 Low-latency & “Reliable” Write Buffer on S3 AWS Rocksdb/S3 Client Data Lake Distributed Cache ~5 ms Append Upload in background
  • 16.
    Alluxio: Bringing Performanceand Semantics to S3 A software layer that transparently sits between applications and S3 (or any object store) offering both POSIX and S3-compatible APIs. Benefits (on top of S3) Capability Zero-migration Mount existing S3 buckets as-is; no data move required Low-latency accelerator Achieves sub-ms latency for S3 objects Semantic bridge Allows user to use S3 or POSIX Minimal-hardware requirement Pool local SSDs/NVMEs for intelligent, cost-efficient caching Flexible write modes Enable append, async writes, and cache-only updates Kubernetes-native Deploy via Operator; integrated metrics, tracing, and observability
  • 17.
    Alternatives on AWS:Side-by-Side Comparison Feature S3 Standard S3 Express One Zone FSx Lustre + S3 Alluxio + S3 Latency (TTFB) 100+ ms 1–10 ms 1 ms 1 ms Multi-cloud ❌ ❌ ❌ ✅ POSIX API ❌ ❌ ✅ ✅ S3 API ✅ ✅ ❌ ✅ Support WALs (Append) ❌ ✅ ✅ ✅ (via POSIX) Data Migration Required No High (Creation time choice) No No Cost ($/TB/mo) ~$23 1 ~$110 2 ~$143 3 ~$23 4 to ~$41 5 1 Assumes S3 standard is the source of truth, hoping full data 2 Assumes S3 Express One Zone holding full data, as it needs to be decided at bucket creation time 3 Assumes for 1,000 MB/s/TiB class, FSx Lustre holding 20% hot data, while S3 keeps full data 4 Assumes Alluxio deployed on GPU spare disks holding 20% hot data, no additional hardware cost, while S3 keeps full data 5 Assumes separate Alluxio cluster holding 20% hot data using i3en.6xlarge instances (1 yr reserved), while S3 keeps full data
  • 18.
    18 Real World UseCases $ ● Blackout Power Trading ● Salesforce
  • 19.
    30 minutes betweenlast forecast and market close = 15 minutes to run inference w/ thousands of small ML models + 15 minutes to review, manually adjust for risk profiles and human insights, and submit Market Window Pressure Cooker S3 Becoming Inefficient ● Currently 5k models (~6/sec) -> growing to 100k (~110/sec) ● Query averaged 3.8 seconds for inference ● Reading models artifact and saving results 3+ seconds ● ~80% of time waiting on IO CASE STUDY 1: Blackout Power Trading Achieving Double-Digit Millisecond Offline Feature Stores with Alluxio
  • 20.
    Query Type* WithoutAlluxio With Alluxio Query (ms) Cold Query (ms) Hot Query (ms) Inference (24 rows) 3,727 99 45 Training (70k rows) 3,841 171 104 Performance Operational Payoff ● ~60x reduction in latency for inference ● ~30× reduction in latency for training ● Scaling from 5,000 to 100,000+ models in the same 15-minute window ● No online feature store necessary ● Low latency training data * 20 table join, 4 columns per table, 1 primary key, 81 column result CASE STUDY 1: Blackout Power Trading Achieving Double-Digit Millisecond Offline Feature Stores with Alluxio
  • 21.
    CASE STUDY 1:Blackout Power Trading Achieving Double-Digit Millisecond Offline Feature Stores with Alluxio Talk from Greg Lindstrom @ Blackout Power Trading https://www.alluxio.io/videos/ai-ml-infra-meetup-achieving-double-digit-millisecond-offline-feature-stores-with-alluxio
  • 22.
    CASE STUDY 2:Salesforce Ultra Low Latency Access for Parquet on S3 22 Before 50 ms - 200 ms Agents Iceberg Data Lake Challenges: requires ultra-low latency data accesses to PBs of parquet files for Agentforce. ● S3 P99 latency is ~100 ms ● Data lake is huge (PB+) ● Agents make multiple (10 to 100+) fine-grained requests to the data lake. ● Same function needed across 50 AZs ~1 ms Query Offloading Agents Iceberg Data Lake After
  • 23.
    CASE STUDY 2:Salesforce Ultra Low Latency Access for Parquet on S3 23 With query offloading (like single RPC to a parquet reader), the request is optimized, and much of the filtering/logic happens closer to the data, reducing chatter and latency. Result: 411 ms → 0.3 ms
  • 24.
    CASE STUDY 2:Salesforce Ultra Low Latency Access for Parquet on S3 24 A joint engineering collaboration between Alluxio and Salesforce. For More details: https://www.alluxio.io/whitepaper/meet-in-the-middle-for-a-1-000x-performance-boost-querying-parquet-files -on-petabyte-scale-data-lakes
  • 25.
    25 Takeaway $ ● S3 remainsessential but wasn't designed for latency-critical, semantically rich workloads ● Don't need to migrate away from S3 — augment it with intelligent caching and performance layers ● Alluxio bridges the gap between S3's cost-effectiveness and the performance demands of modern AI/ML workloads ● Proven results: Sub-millisecond latency, and real-world production deployments ● Maximize ROI: Get premium performance without premium costs or operational complexity
  • 26.
    BOOK A DEMOWITH US