Greg Lindstrom, VP ML Trading
Alluxio Meetup | September 30, 2025
Achieving Double-Digit Millisecond
Offline Feature Stores with Alluxio
What if…
● What if S3 was 10x faster? 30x faster? (bandwidth and latency)
● What if S3 data was served at in-memory latency?
● What complexity could be reduced, how much money and time
could be saved?
But First… Power Trading 101
10k+ Tradeable Locations
Source: Yes Energy
Locational Marginal Price (LMP)
● Prices are the result of a huge math model solving optimal
dispatch (minimize cost)
● Marginal generation unit sets price
Day-Ahead Versus Real Time Power
● Day-ahead (DA) power scheduled one day in advance
● Real-time (RT) power is scheduled every 5 minutes
● Trading the spread between DA and RT power
○ Why are these markets different?
● Think air traffic controller scheduling planes one day in
advance
Day-Ahead Power Trading Primer
● Daily blind auction
● Who participates:
○ Utilities
○ Asset owners
○ Speculators (optimizers)
● Speculators physically change generator dispatch
Competitive Market
● Using the latest data
○ Weather forecasts
○ Renewables forecasts
○ Outages
○ Pricing
○ Etc
● Latest renewables forecast comes out 30 minutes before
market close
Market Window Pressure Cooker
● 30 minutes between last forecast and market close
=
● 15 minutes to run inference thousands of small ML models
+
● 15 minutes to review, manually adjust for risk profiles and
human insights, and submit
The Feature Join Problem
Benchmark Join:
● 20 feature tables, 4 columns from each table, 1 primary key
resulting in 81 columns
● 24 rows for inference and 70k rows for training
Join Example in SQL
SELECT
t1.col1, t1.col2, t1.col3, t1.col4,
t2.col1, t2.col2, t2.col3, t2.col4,
...
t20.col1, t20.col2, t20.col3, t20.col4
FROM table1 t1
JOIN table2 t2 ON t1.id = t2.t1_id
JOIN table3 t3 ON t2.id = t3.t2_id
JOIN table4 t4 ON t3.id = t4.t3_id
...
JOIN table20 t20 ON t19.id = t20.t19_id;
S3 Becoming Inefficient
● Currently 5k models (~6/sec) -> growing to 100k (~110/sec)
● Query averaged 3.8 seconds for inference
● Reading models artifact and saving results 3+ seconds
● ~80% of time waiting on IO
Why Have a Feature Store?
● Simplicity
○ Ingest
○ Query
● Metadata
● Feature Views
● Versioning
Standard Feature Store Pipeline
Source: https://www.tecton.ai/blog/what-is-a-feature-store/
Online Feature Stores
Pros
● Very high performance
inference
● Ingesting streaming data
is simple
Cons
● Limited Data
● Complexity
○ Data lifecycle
○ Split sources of truth
● Expensive
● Training data serving is
still very slow
● Volatile
Offline Feature Stores
Pros
● Relatively cheap
● Low complexity
● Durable
● All data
Cons
● Slow
● Ingesting streaming data
is more complex
What if it wasn’t slow?
What Makes Offline Slow?
1. Storage latency / bandwidth
2. Storage format
3. Feature joining
4. Post-join data transfer
What if we optimized for speed?
1. Solving Storage Latency / Bandwidth
[Alluxio enters the chat]
● Cache files on NVME drives for low latency
● Bandwidth now constrained by EC2 instance types (how does
10GB/s sound?)
● High Availability + Maintains S3 durability
● Scales linearly
● Lots of tuning options depending on workload
2. Solving Storage Format
● Contenders: Parquet, Delta Lake, Iceberg, Hudi, Avro
● Parquet is easily the winner for speed however:
○ Can't handle concurrent writes
○ Queries may see partial results (partitioned tables)
○ No version history
Can we still make parquet work?
3. Solving Multi-Join Performance
● Contenders: Spark, Trino, Flink, Dask, Duckdb, Pandas, Polars
● Polars and Duckdb are the fastest with Polars being the clear
winner
● Ideally zero-copy for post-join data transfer
4. Solving post-join data transfer
● Format Contenders: Json, Parquet, Arrow, others
● Arrow streaming over gRPC (HTTP2) outperforms all others
which is a huge time savings due to…
● Arrow flight server
Putting It Together
● Kubernetes
● High Availability
● Offline Low Latency
● Scalable
● Durable
Performance
Query Type*
Without
Alluxio
With Alluxio
Query
(ms)
Cold Query
(ms)
Hot Query
(ms)
Inference (24 rows) 3,727 99 45
Training (70k rows) 3,841 171 104
* 20 Table Join, 4 Columns per Table, 1 Primary key, 81 Column Result
Operational Payoff
● ~60x reduction in latency for inference
● ~30× reduction in latency for training
● Scaling from 5,000 to 100,000+ models in the same 15-minute
window
● No online feature store necessary
● Low latency training data
Q&A
● Discussion
● Blog Post:
https://www.alluxio.io/blog/blackout-power-trading-achieved-low-l
atency-offline-feature-store-performance-with-alluxio-caching
● Get in touch: linkedin.com/in/greg-lindstrom

AI/ML Infra Meetup | Achieving Double-Digit Millisecond Offline Feature Stores with Alluxio

  • 1.
    Greg Lindstrom, VPML Trading Alluxio Meetup | September 30, 2025 Achieving Double-Digit Millisecond Offline Feature Stores with Alluxio
  • 2.
    What if… ● Whatif S3 was 10x faster? 30x faster? (bandwidth and latency) ● What if S3 data was served at in-memory latency? ● What complexity could be reduced, how much money and time could be saved?
  • 3.
    But First… PowerTrading 101
  • 4.
  • 5.
    Locational Marginal Price(LMP) ● Prices are the result of a huge math model solving optimal dispatch (minimize cost) ● Marginal generation unit sets price
  • 6.
    Day-Ahead Versus RealTime Power ● Day-ahead (DA) power scheduled one day in advance ● Real-time (RT) power is scheduled every 5 minutes ● Trading the spread between DA and RT power ○ Why are these markets different? ● Think air traffic controller scheduling planes one day in advance
  • 7.
    Day-Ahead Power TradingPrimer ● Daily blind auction ● Who participates: ○ Utilities ○ Asset owners ○ Speculators (optimizers) ● Speculators physically change generator dispatch
  • 8.
    Competitive Market ● Usingthe latest data ○ Weather forecasts ○ Renewables forecasts ○ Outages ○ Pricing ○ Etc ● Latest renewables forecast comes out 30 minutes before market close
  • 9.
    Market Window PressureCooker ● 30 minutes between last forecast and market close = ● 15 minutes to run inference thousands of small ML models + ● 15 minutes to review, manually adjust for risk profiles and human insights, and submit
  • 10.
    The Feature JoinProblem Benchmark Join: ● 20 feature tables, 4 columns from each table, 1 primary key resulting in 81 columns ● 24 rows for inference and 70k rows for training
  • 11.
    Join Example inSQL SELECT t1.col1, t1.col2, t1.col3, t1.col4, t2.col1, t2.col2, t2.col3, t2.col4, ... t20.col1, t20.col2, t20.col3, t20.col4 FROM table1 t1 JOIN table2 t2 ON t1.id = t2.t1_id JOIN table3 t3 ON t2.id = t3.t2_id JOIN table4 t4 ON t3.id = t4.t3_id ... JOIN table20 t20 ON t19.id = t20.t19_id;
  • 12.
    S3 Becoming Inefficient ●Currently 5k models (~6/sec) -> growing to 100k (~110/sec) ● Query averaged 3.8 seconds for inference ● Reading models artifact and saving results 3+ seconds ● ~80% of time waiting on IO
  • 13.
    Why Have aFeature Store? ● Simplicity ○ Ingest ○ Query ● Metadata ● Feature Views ● Versioning
  • 14.
    Standard Feature StorePipeline Source: https://www.tecton.ai/blog/what-is-a-feature-store/
  • 15.
    Online Feature Stores Pros ●Very high performance inference ● Ingesting streaming data is simple Cons ● Limited Data ● Complexity ○ Data lifecycle ○ Split sources of truth ● Expensive ● Training data serving is still very slow ● Volatile
  • 16.
    Offline Feature Stores Pros ●Relatively cheap ● Low complexity ● Durable ● All data Cons ● Slow ● Ingesting streaming data is more complex What if it wasn’t slow?
  • 17.
    What Makes OfflineSlow? 1. Storage latency / bandwidth 2. Storage format 3. Feature joining 4. Post-join data transfer What if we optimized for speed?
  • 18.
    1. Solving StorageLatency / Bandwidth [Alluxio enters the chat] ● Cache files on NVME drives for low latency ● Bandwidth now constrained by EC2 instance types (how does 10GB/s sound?) ● High Availability + Maintains S3 durability ● Scales linearly ● Lots of tuning options depending on workload
  • 19.
    2. Solving StorageFormat ● Contenders: Parquet, Delta Lake, Iceberg, Hudi, Avro ● Parquet is easily the winner for speed however: ○ Can't handle concurrent writes ○ Queries may see partial results (partitioned tables) ○ No version history Can we still make parquet work?
  • 20.
    3. Solving Multi-JoinPerformance ● Contenders: Spark, Trino, Flink, Dask, Duckdb, Pandas, Polars ● Polars and Duckdb are the fastest with Polars being the clear winner ● Ideally zero-copy for post-join data transfer
  • 21.
    4. Solving post-joindata transfer ● Format Contenders: Json, Parquet, Arrow, others ● Arrow streaming over gRPC (HTTP2) outperforms all others which is a huge time savings due to… ● Arrow flight server
  • 22.
    Putting It Together ●Kubernetes ● High Availability ● Offline Low Latency ● Scalable ● Durable
  • 23.
    Performance Query Type* Without Alluxio With Alluxio Query (ms) ColdQuery (ms) Hot Query (ms) Inference (24 rows) 3,727 99 45 Training (70k rows) 3,841 171 104 * 20 Table Join, 4 Columns per Table, 1 Primary key, 81 Column Result
  • 24.
    Operational Payoff ● ~60xreduction in latency for inference ● ~30× reduction in latency for training ● Scaling from 5,000 to 100,000+ models in the same 15-minute window ● No online feature store necessary ● Low latency training data
  • 25.
    Q&A ● Discussion ● BlogPost: https://www.alluxio.io/blog/blackout-power-trading-achieved-low-l atency-offline-feature-store-performance-with-alluxio-caching ● Get in touch: linkedin.com/in/greg-lindstrom