Smarter RAG Pipelines: Scaling Search with Milvus and Feast

Francisco Javier Arceo
Senior Principal Software Engineer, Red Hat
Kubeflow Steering Committee Member
Feast Maintainer
Feast, RAG,
and Milvus

Hello! 👋
A little about me
Led Data Science, Data Engineering, and ML Infra teams
at different companies
Somehow stumbled into maintaining Feast, the Open
Source Feature store
Get to work on a mixture of distributed training,
pipelines, feature store, RAG, and agents!
In my ample free time I like to write code
I've spent 12+ years building AI/ML
solutions for banks and fintechs
1
Joined Red Hat to work on Open Source AI
2
Wife and 2 children and I call NJ home 🤠
3

What is RAG?
Retrieval Augmented Generation
Published in NeurIPs 2020
Query Encoding
Retriever + Generator
Meta AI Research team
A pretrained encoder
In the seminal paper, they ran end-to-end
backpropagation/fine tuning on both the
Retriever and Generator

Why did RAG become so popular?
OpenAI
Published in NeurIPs 2020
ChatGPT took flight in Oct 2022
Google Trend shows takeoff
Most RAG applications only use
inference 😅
Meta AI Research team
They suggested using RAG 🤯
Easier to do than fine tuning!

How does RAG work?
The Simplest RAG
Embed Data
Take documents/text and convert it into numeric
(vector) representation
Insert Data into datastore
Insert all of that data (often in batch)
Embed User Query
In real-time, embed a user's query
Retrieve Documents with
Vector Similarity Search
Compute the cosine similarity between query
and all other vector representations and return
top k

How can Feast help with RAG?
Empowers MLEs to do what they do best, harness the power of their data!
Easy to ship RAG to
production!
Battle-tested support for real-
time, batch, and streaming
Built to scale for distributed
computing and ingestion
Fine-tuning as a first class
citizen
Fully Open Source!

Feast in Production
Feast values inference and fine tuning as first class citizens
Online Infrastructure
Offline Infrastructure
Scale
For model inference / RAG
For model fine tuning
Kubernetes (Helm + Operator)

Feast 🤝Milvus 🤝Docling
Talk with your Docs!

Feast Objects
Entities
Data Sources
Feature Views
These are primary keys
Files and Request objects (i.e., a CSV and an
API call)
This defines a collection of features/fields
where we easily can enable vector search
during retrieval

Document/Data Transformation!
Feast allows for Feature
Transformation in
Decorators!
Batch Compute Engines (e.g., Spark)
Streaming Compute Engines (E.g,. Spark,
Flink)
API Servers (e.g., the Feast Feature Server)
Defines entities, schemas, data sources, and
some other configurations
Allows for MLEs to easily take data to
production

Document/Data Ingestion
Ingestion in Feast is simple
Supports more scalable
ingestion as well
Several API endpoints available
More details in the docs

Feast Roadmap 🚀
What's on the horizon for Feast?
More NLP!
We want Feast to be the go-to-framework for AI users to customize their RAG
solutions and that means investing more in Milvus
Image Support
Images often benefit from metadata in recommender systems and we intend on
enhancing Feast in this space, in part because the benefits for RAG are very clear
Scaling Batch with Spark and Ray
We plan to continue to invest in the Spark development experience
We plan to add Ray as a new compute engine
Latency Improvements
We want to make Feast blazing fast and have made significant progress here

Thank you!
Here are some useful links:
Feast RAG Blog Post
Feast Documentation
Feast Website
GitHub Repo with Demo
GitHub Demo with Docling Demo

Smarter RAG Pipelines: Scaling Search with Milvus and Feast

More Related Content

Similar to Smarter RAG Pipelines: Scaling Search with Milvus and Feast

More from Zilliz

Recently uploaded

Smarter RAG Pipelines: Scaling Search with Milvus and Feast