1 | © Copyright 2024 Zilliz
1
It's in the Air Tonight.
Sensor Data in RAG
Tim Spann @ Zilliz
Slides
X
Overview
I will do a quick overview of the basics of Vector Databases and Milvus and
then dive into a practical example of how to use one as part of an
application. I will demonstrate how to consume air quality data and ingest it
into Milvus as vectors and scalars. We will then use our vector database of
Air Quality readings to feed our LLM and get proper answers to Air Quality
questions. I will show you how to all the steps to build a RAG application
with Milvus, LangChain, Ollama, Python and Air Quality Reports. Finally
after demos I will answer questions.
4 | © Copyright Zilliz
4
A
G
E
N
D
A
Introduction
Overview of Vector Databases
A Quick Introduction to Milvus
Consume and Ingest Air Quality Data
Building a local RAG application
Q&A
5 | © Copyright Zilliz
5
01 Introduction
6 | © Copyright 2024 Zilliz
6
6 | © Copyright 10/22/23 Zilliz
6 | © Copyright 2024 Zilliz
Tim Spann
Principal Developer
Advocate, Zilliz
tim.spann@zilliz.com
https://www.linkedin.com/in/timothyspann/
https://x.com/PaaSDev
7 | © Copyright Zilliz
7
Show Me A Demo
https://multimodal-demo.milvus.io/
https://milvus.io/milvus-demos/reverse-image-search
8 | © Copyright Zilliz
8 osschat.io
Unstructured Data is Everywhere
Unstructured data is any data that does not conform to a predefined
data model.
Currently, 90% of unstructured data is never analyzed.
Images Videos and more!
Text
10 | © Copyright Zilliz
10
…and cannot process increasingly growing unstructured
data
Data Source: The Digitization of the World by IDC
20%
Other
newly generated data in 2025
will be unstructured data
80%
The challenge of unstructured data
● Problem: Unstructured data comes in lots of forms, no easy
way to interact with it all
● Solution: Vector embeddings
● How: Neural networks e.g. embedding models
Vector
Databases
12 | © Copyright Zilliz
12
02 Overview of Vector Databases
Why a Vector Database?
•Vector database
• Advanced filtering (filtered vector search, chained
filters)
• Hybrid search (e.g. full text + dense vector)
• Durability (any write in a db is durable, a library
typically only supports snapshotting)
• Replication / High Availability
• Sharding
• Aggregations or faceted search
• Backups
• Lifecycle management (CRUD, Batch delete,
dropping whole indexes, reindexing)
• Multi-tenancy
• Vector search library
• High-performance vector search
•How do I support different applications?
• High query load
• High insertion/deletion
• Full precision/recall
• Accelerator support (GPU, FPGA)
• Billion-scale storage
Purpose-built to store, index and query vector embeddings from unstructured
data.
Vn, 1
…
…
…
1
2
3
4
5
Transform into
Vectors
Unstructured Data
Images
User Generated
Content
Video
Documents
Audio
Vector Embeddings
Perform
Approximate
Nearest Neighbor
Similarity Search
Perform Query
Get Results
Store in Vector Database
How Similarity Search Works
15
2024
A vector database stores
embedding vectors and
allows for semantic
retrieval of various types
of unstructured data.
Vector Database: Making Sense of Unstructured Data
16 | © Copyright Zilliz
16
03 A Quick Introduction to Milvus
17 | © Copyright Zilliz
17
Milvus Features
Multi-Tenancy
Hardware-
Accelerated
Compute Support
Python, Java,
Golang, NodeJS
Milvus Lite, K8,
Zilliz Cloud, Docker
Scalable and Elastic
Architecture
Diverse Index
Support
Versatile Search
Capabilities
Tunable
Consistency
18 | © Copyright Zilliz
18
Technologies for various types of Use
cases
Compute Types
Designed for various
compute powers, such as
AVX512, Neon for SIMD,
quantization cache-aware
optimization and GPU
Leverage strengths of each
hardware type, ensuring
high-speed processing and
cost-effective scalability for
different application needs
Search Types
Support multiple types such
as top-K ANN, Range ANN,
sparse & dense,
multi-vector, grouping,
and metadata filtering
Enable query flexibility and
accuracy, allowing
developers to tailor their
information retrieval needs
Multi-tenancy
Enable multi-tenancy
through collection and
partition management
Allow for efficient resource
utilization and customizable
data segregation, ensuring
secure and isolated data
handling for each tenant
Index Types
Offer a wide range of 15
indexes support, including
popular ones like
Hierarchical Navigable
Small Worlds HNSW, PQ,
Binary, Sparse, DiskANN
and GPU index
Empower developers with
tailored search
optimizations, catering to
performance, accuracy and
cost needs
https://docs.voxel51.com/integrations/milvus.html
FiveOne + Milvus
https://milvus.io/docs/integrate_with_voxel51.md
20 | © Copyright Zilliz
20
04 Consume and Ingest Air Quality
Data
21 | © Copyright Zilliz
21
DATA!!!!
22 | © Copyright Zilliz
22
DATA
23 | © Copyright Zilliz
23
REST JSON Ingest
24 | © Copyright Zilliz
24
Scalars and Vectors in Milvus
25 | © Copyright Zilliz
25
05 Building a local RAG application
26
Retrieval-Augmented Generation (RAG)
2024
A technique that combines the
strength of retrieval-based and
generative models:
● Improve accuracy and relevance
● Eliminate hallucination
● Provide domain-specific
knowledge
27
RAG : an economic perspective
2024
A business model that bridges public
data and private data
● Data sovereignty
● You can't and shouldn't give your
private data to others
28 | © Copyright Zilliz
28
Ollama + Llama 3.1 + Milvus + LangChain = RAG
29 | © Copyright Zilliz
29
Ollama + Llama 3.1 + Milvus + LangChain = RAG
30 | © Copyright Zilliz
30
Embeddings Models
31 | © Copyright Zilliz
31
06 Q & A
https://medium.com/@tspann/whats-in-the-air-tonight-mr-milvus-fbd42f06e482
33 | © Copyright Zilliz
33 | © Copyright Zilliz
33
RESOURCES
34 | © Copyright Zilliz
34
Vector Database Resources
Give Milvus a Star! Chat with me on Discord!
https://github.com/milvus-io/milvus
35
Unstructured Data Meetup
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics
such as vector databases, LLMs, and managing data at scale. The intended audience of this group
includes roles like machine learning engineers, data scientists, data engineers, software engineers, and
PMs.
This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
36 | © Copyright Zilliz
36
https://zilliz.com/learn/generative-ai
https://medium.com/@tspann/unstructured-street-data-in-new-york-8d3cde0a1e5b
https://medium.com/@tspann/not-every-field-is-just-text-numbers-or-vectors-976231e90e4d
https://medium.com/@tspann/shining-some-light-on-the-new-milvus-lite-5a0565eb5dd9
https://medium.com/@tspann/unstructured-data-processing-with-a-raspberry-pi-ai-kit-c959dd7fff47
Raspberry Pi AI Kit Hailo
Edge AI
42 | © Copyright 2024 Zilliz
42
42
This week in Milvus, Towhee, Attu, GPT
Cache, Gen AI, LLM, Apache NiFi, Apache
Flink, Apache Kafka, ML, AI, Apache Spark,
Apache Iceberg, Python, Java, Vector DB
and Open Source friends.
https://bit.ly/32dAJft
https://github.com/milvus-io/milvus
AIM Weekly by Tim Spann
43 | © Copyright 2024 Zilliz
43
milvus.io
github.com/milvus-io/
@milvusio
@paasDev
/in/timothyspann
Connect with me!
Thank you!
44 | © Copyright 2024 Zilliz
44
45 | © Copyright 2024 Zilliz
45
46 | © Copyright 2024 Zilliz
46
Join us at our next meetup!
meetup.com/unstructured-data-meetup-
new-york/
47 | © Copyright Zilliz
47
T H A N K Y O U

09-12-2024 - Milvus, Vector database used for Sensor Data RAG

  • 1.
    1 | ©Copyright 2024 Zilliz 1 It's in the Air Tonight. Sensor Data in RAG Tim Spann @ Zilliz
  • 2.
  • 3.
    Overview I will doa quick overview of the basics of Vector Databases and Milvus and then dive into a practical example of how to use one as part of an application. I will demonstrate how to consume air quality data and ingest it into Milvus as vectors and scalars. We will then use our vector database of Air Quality readings to feed our LLM and get proper answers to Air Quality questions. I will show you how to all the steps to build a RAG application with Milvus, LangChain, Ollama, Python and Air Quality Reports. Finally after demos I will answer questions.
  • 4.
    4 | ©Copyright Zilliz 4 A G E N D A Introduction Overview of Vector Databases A Quick Introduction to Milvus Consume and Ingest Air Quality Data Building a local RAG application Q&A
  • 5.
    5 | ©Copyright Zilliz 5 01 Introduction
  • 6.
    6 | ©Copyright 2024 Zilliz 6 6 | © Copyright 10/22/23 Zilliz 6 | © Copyright 2024 Zilliz Tim Spann Principal Developer Advocate, Zilliz tim.spann@zilliz.com https://www.linkedin.com/in/timothyspann/ https://x.com/PaaSDev
  • 7.
    7 | ©Copyright Zilliz 7 Show Me A Demo https://multimodal-demo.milvus.io/ https://milvus.io/milvus-demos/reverse-image-search
  • 8.
    8 | ©Copyright Zilliz 8 osschat.io
  • 9.
    Unstructured Data isEverywhere Unstructured data is any data that does not conform to a predefined data model. Currently, 90% of unstructured data is never analyzed. Images Videos and more! Text
  • 10.
    10 | ©Copyright Zilliz 10 …and cannot process increasingly growing unstructured data Data Source: The Digitization of the World by IDC 20% Other newly generated data in 2025 will be unstructured data 80%
  • 11.
    The challenge ofunstructured data ● Problem: Unstructured data comes in lots of forms, no easy way to interact with it all ● Solution: Vector embeddings ● How: Neural networks e.g. embedding models Vector Databases
  • 12.
    12 | ©Copyright Zilliz 12 02 Overview of Vector Databases
  • 13.
    Why a VectorDatabase? •Vector database • Advanced filtering (filtered vector search, chained filters) • Hybrid search (e.g. full text + dense vector) • Durability (any write in a db is durable, a library typically only supports snapshotting) • Replication / High Availability • Sharding • Aggregations or faceted search • Backups • Lifecycle management (CRUD, Batch delete, dropping whole indexes, reindexing) • Multi-tenancy • Vector search library • High-performance vector search •How do I support different applications? • High query load • High insertion/deletion • Full precision/recall • Accelerator support (GPU, FPGA) • Billion-scale storage Purpose-built to store, index and query vector embeddings from unstructured data.
  • 14.
    Vn, 1 … … … 1 2 3 4 5 Transform into Vectors UnstructuredData Images User Generated Content Video Documents Audio Vector Embeddings Perform Approximate Nearest Neighbor Similarity Search Perform Query Get Results Store in Vector Database How Similarity Search Works
  • 15.
    15 2024 A vector databasestores embedding vectors and allows for semantic retrieval of various types of unstructured data. Vector Database: Making Sense of Unstructured Data
  • 16.
    16 | ©Copyright Zilliz 16 03 A Quick Introduction to Milvus
  • 17.
    17 | ©Copyright Zilliz 17 Milvus Features Multi-Tenancy Hardware- Accelerated Compute Support Python, Java, Golang, NodeJS Milvus Lite, K8, Zilliz Cloud, Docker Scalable and Elastic Architecture Diverse Index Support Versatile Search Capabilities Tunable Consistency
  • 18.
    18 | ©Copyright Zilliz 18 Technologies for various types of Use cases Compute Types Designed for various compute powers, such as AVX512, Neon for SIMD, quantization cache-aware optimization and GPU Leverage strengths of each hardware type, ensuring high-speed processing and cost-effective scalability for different application needs Search Types Support multiple types such as top-K ANN, Range ANN, sparse & dense, multi-vector, grouping, and metadata filtering Enable query flexibility and accuracy, allowing developers to tailor their information retrieval needs Multi-tenancy Enable multi-tenancy through collection and partition management Allow for efficient resource utilization and customizable data segregation, ensuring secure and isolated data handling for each tenant Index Types Offer a wide range of 15 indexes support, including popular ones like Hierarchical Navigable Small Worlds HNSW, PQ, Binary, Sparse, DiskANN and GPU index Empower developers with tailored search optimizations, catering to performance, accuracy and cost needs
  • 19.
  • 20.
    20 | ©Copyright Zilliz 20 04 Consume and Ingest Air Quality Data
  • 21.
    21 | ©Copyright Zilliz 21 DATA!!!!
  • 22.
    22 | ©Copyright Zilliz 22 DATA
  • 23.
    23 | ©Copyright Zilliz 23 REST JSON Ingest
  • 24.
    24 | ©Copyright Zilliz 24 Scalars and Vectors in Milvus
  • 25.
    25 | ©Copyright Zilliz 25 05 Building a local RAG application
  • 26.
    26 Retrieval-Augmented Generation (RAG) 2024 Atechnique that combines the strength of retrieval-based and generative models: ● Improve accuracy and relevance ● Eliminate hallucination ● Provide domain-specific knowledge
  • 27.
    27 RAG : aneconomic perspective 2024 A business model that bridges public data and private data ● Data sovereignty ● You can't and shouldn't give your private data to others
  • 28.
    28 | ©Copyright Zilliz 28 Ollama + Llama 3.1 + Milvus + LangChain = RAG
  • 29.
    29 | ©Copyright Zilliz 29 Ollama + Llama 3.1 + Milvus + LangChain = RAG
  • 30.
    30 | ©Copyright Zilliz 30 Embeddings Models
  • 31.
    31 | ©Copyright Zilliz 31 06 Q & A
  • 32.
  • 33.
    33 | ©Copyright Zilliz 33 | © Copyright Zilliz 33 RESOURCES
  • 34.
    34 | ©Copyright Zilliz 34 Vector Database Resources Give Milvus a Star! Chat with me on Discord! https://github.com/milvus-io/milvus
  • 35.
    35 Unstructured Data Meetup https://www.meetup.com/unstructured-data-meetup-new-york/ Thismeetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs. This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
  • 36.
    36 | ©Copyright Zilliz 36 https://zilliz.com/learn/generative-ai
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
    42 | ©Copyright 2024 Zilliz 42 42 This week in Milvus, Towhee, Attu, GPT Cache, Gen AI, LLM, Apache NiFi, Apache Flink, Apache Kafka, ML, AI, Apache Spark, Apache Iceberg, Python, Java, Vector DB and Open Source friends. https://bit.ly/32dAJft https://github.com/milvus-io/milvus AIM Weekly by Tim Spann
  • 43.
    43 | ©Copyright 2024 Zilliz 43 milvus.io github.com/milvus-io/ @milvusio @paasDev /in/timothyspann Connect with me! Thank you!
  • 44.
    44 | ©Copyright 2024 Zilliz 44
  • 45.
    45 | ©Copyright 2024 Zilliz 45
  • 46.
    46 | ©Copyright 2024 Zilliz 46 Join us at our next meetup! meetup.com/unstructured-data-meetup- new-york/
  • 47.
    47 | ©Copyright Zilliz 47 T H A N K Y O U