AI/ML Infra Meetup | Achieving Double-Digit Millisecond Offline Feature Stores with Alluxio

Greg Lindstrom, VP ML Trading
Alluxio Meetup | September 30, 2025
Achieving Double-Digit Millisecond
Offline Feature Stores with Alluxio

What if…
● What if S3 was 10x faster? 30x faster? (bandwidth and latency)
● What if S3 data was served at in-memory latency?
● What complexity could be reduced, how much money and time
could be saved?

But First… Power Trading 101

10k+ Tradeable Locations
Source: Yes Energy

Locational Marginal Price (LMP)
● Prices are the result of a huge math model solving optimal
dispatch (minimize cost)
● Marginal generation unit sets price

Day-Ahead Versus Real Time Power
● Day-ahead (DA) power scheduled one day in advance
● Real-time (RT) power is scheduled every 5 minutes
● Trading the spread between DA and RT power
○ Why are these markets different?
● Think air traffic controller scheduling planes one day in
advance

Day-Ahead Power Trading Primer
● Daily blind auction
● Who participates:
○ Utilities
○ Asset owners
○ Speculators (optimizers)
● Speculators physically change generator dispatch

Competitive Market
● Using the latest data
○ Weather forecasts
○ Renewables forecasts
○ Outages
○ Pricing
○ Etc
● Latest renewables forecast comes out 30 minutes before
market close

Market Window Pressure Cooker
● 30 minutes between last forecast and market close
=
● 15 minutes to run inference thousands of small ML models
+
● 15 minutes to review, manually adjust for risk proﬁles and
human insights, and submit

The Feature Join Problem
Benchmark Join:
● 20 feature tables, 4 columns from each table, 1 primary key
resulting in 81 columns
● 24 rows for inference and 70k rows for training

Join Example in SQL
SELECT
t1.col1, t1.col2, t1.col3, t1.col4,
t2.col1, t2.col2, t2.col3, t2.col4,
...
t20.col1, t20.col2, t20.col3, t20.col4
FROM table1 t1
JOIN table2 t2 ON t1.id = t2.t1_id
...
JOIN table20 t20 ON t19.id = t20.t19_id;

S3 Becoming Inefficient
● Currently 5k models (~6/sec) -> growing to 100k (~110/sec)
● Query averaged 3.8 seconds for inference
● Reading models artifact and saving results 3+ seconds
● ~80% of time waiting on IO

Why Have a Feature Store?
● Simplicity
○ Ingest
○ Query
● Metadata
● Feature Views
● Versioning

Standard Feature Store Pipeline
Source: https://www.tecton.ai/blog/what-is-a-feature-store/

Online Feature Stores
Pros
● Very high performance
inference
● Ingesting streaming data
is simple
Cons
● Limited Data
● Complexity
○ Data lifecycle
○ Split sources of truth
● Expensive
● Training data serving is
still very slow
● Volatile

Offline Feature Stores
Pros
● Relatively cheap
● Low complexity
● Durable
● All data
Cons
● Slow
● Ingesting streaming data
is more complex
What if it wasn’t slow?

What Makes Offline Slow?
1. Storage latency / bandwidth
2. Storage format
3. Feature joining
4. Post-join data transfer
What if we optimized for speed?

1. Solving Storage Latency / Bandwidth
[Alluxio enters the chat]
● Cache ﬁles on NVME drives for low latency
● Bandwidth now constrained by EC2 instance types (how does
10GB/s sound?)
● High Availability + Maintains S3 durability
● Scales linearly
● Lots of tuning options depending on workload

2. Solving Storage Format
● Contenders: Parquet, Delta Lake, Iceberg, Hudi, Avro
● Parquet is easily the winner for speed however:
○ Can't handle concurrent writes
○ Queries may see partial results (partitioned tables)
○ No version history
Can we still make parquet work?

3. Solving Multi-Join Performance
● Contenders: Spark, Trino, Flink, Dask, Duckdb, Pandas, Polars
● Polars and Duckdb are the fastest with Polars being the clear
winner
● Ideally zero-copy for post-join data transfer

4. Solving post-join data transfer
● Format Contenders: Json, Parquet, Arrow, others
● Arrow streaming over gRPC (HTTP2) outperforms all others
which is a huge time savings due to…
● Arrow ﬂight server

Putting It Together
● Kubernetes
● High Availability
● Offline Low Latency
● Scalable
● Durable

Performance
Query Type*
Without
Alluxio
With Alluxio
Query
(ms)
Cold Query
(ms)
Hot Query
(ms)
Inference (24 rows) 3,727 99 45
Training (70k rows) 3,841 171 104
* 20 Table Join, 4 Columns per Table, 1 Primary key, 81 Column Result

Operational Payoff
● ~60x reduction in latency for inference
● ~30× reduction in latency for training
● Scaling from 5,000 to 100,000+ models in the same 15-minute
window
● No online feature store necessary
● Low latency training data

Q&A
● Discussion
● Blog Post:
https://www.alluxio.io/blog/blackout-power-trading-achieved-low-l
atency-offline-feature-store-performance-with-alluxio-caching
● Get in touch: linkedin.com/in/greg-lindstrom

AI/ML Infra Meetup | Achieving Double-Digit Millisecond Offline Feature Stores with Alluxio

More Related Content

Similar to AI/ML Infra Meetup | Achieving Double-Digit Millisecond Offline Feature Stores with Alluxio

More from Alluxio, Inc.

Recently uploaded

AI/ML Infra Meetup | Achieving Double-Digit Millisecond Offline Feature Stores with Alluxio