1 | © Copyright 2024 Zilliz
1
Unstructured Data Processing From
Cloud to Edge
Tim Spann @ Zilliz
2 | © Copyright 10/22/23 Zilliz
2 | © Copyright 10/22/23 Zilliz
2 | © Copyright 10/22/23 Zilliz
2 | © Copyright 10/22/23 Zilliz
Tim Spann
Principal Developer
Advocate, Zilliz
tim.spann@zilliz.com
https://www.linkedin.com/in/timothyspann/
https://x.com/PaaSDev
Speaker
3 | © Copyright 2024 Zilliz
3
01 Introduction
CONTENTS
02 Edge AI Use Cases
03 Edge Devices
4 | © Copyright 2024 Zilliz
4
5 | © Copyright Zilliz
5
01 Introduction
6 | © Copyright 2024 Zilliz
6
● Introduction to Unstructured Data Processing
● Introduction to Milvus
● Adding Milvus to Your Infrastructure
● AI Use Case Improvements
● Edge devices working with AI and vector databases
7 | © Copyright Zilliz
7
- Unstructured Data is 80% of data
- Vector Databases are the only type of database
that can work with unstructured data
- Examples of Unstructured Data include text,
images, videos, audio, etc
Unstructured Data?
8 | © Copyright Zilliz
8
Why Even Use a Vector DB?
Beyond High-Performance Search
• CRUD Operations: Just like traditional databases, vector
databases allow you to Create, Read, Update, and Delete data.
• Data Freshness: Vector databases ensure your data remains
up-to-date, reflecting the latest information for accurate searches.
• Persistence: Your data is securely stored and persists even if the
system restarts.
• Availability: Your data is readily accessible for search and retrieval
operations.
• Scalability: Vector databases can handle growing data volumes
efficiently.
9 | © Copyright Zilliz
9
Complete Data Management
• Data Management: Vector databases provide tools to manage
your data effectively, including data ingestion, indexing, and
querying.
• Backup and Migration: Create backups of your data for disaster
recovery and easily migrate your data between different systems.
Why Even Use a Vector DB?
10 | © Copyright Zilliz
10
Operational ease
• Cloud or On-Premise Deployment: Vector databases can be
deployed easily on various platforms, including cloud and
on-premise environments.
• Observability: Monitor the health and performance of your vector
database to ensure optimal operation.
• Multi-tenancy: Support multiple users or applications accessing
the same database instance securely.
Why Even Use a Vector DB?
11 | © Copyright Zilliz
11
Milvus Features
Multi-Tenancy
Hardware-
Accelerated
Compute Support
Python, Java,
Golang, NodeJS
Milvus Lite, K8,
Zilliz Cloud, Docker
Scalable and Elastic
Architecture
Diverse Index
Support
Versatile Search
Capabilities
Tunable
Consistency
12 | © Copyright Zilliz
12
Technologies for various types of Use
cases
Compute Types
Designed for various
compute powers, such as
AVX512, Neon for SIMD,
quantization cache-aware
optimization and GPU
Leverage strengths of each
hardware type, ensuring
high-speed processing and
cost-effective scalability for
different application needs
Search Types
Support multiple types such
as top-K ANN, Range ANN,
sparse & dense,
multi-vector, grouping,
and metadata filtering
Enable query flexibility and
accuracy, allowing
developers to tailor their
information retrieval needs
Multi-tenancy
Enable multi-tenancy
through collection and
partition management
Allow for efficient resource
utilization and customizable
data segregation, ensuring
secure and isolated data
handling for each tenant
Index Types
Offer a wide range of 15
indexes support, including
popular ones like
Hierarchical Navigable
Small Worlds HNSW, PQ,
Binary, Sparse, DiskANN
and GPU index
Empower developers with
tailored search
optimizations, catering to
performance, accuracy and
cost needs
13 | © Copyright Zilliz
13
02 Edge AI Use Cases
14 | © Copyright Zilliz
14
• Robots
• Smart Cities
• Smart Factories
• Autonomous Cars
• Automated Retail
• Smart Home
Edge AI Use Cases
15 | © Copyright Zilliz
15
• Proprietary Document Search
• On-Device Object Detection
• Milvus Lite on Device
Local Search on Edge Devices
16 | © Copyright Zilliz
16
• 5G and Everywhere Networks
• IoT with Cheap Plentiful Sensors
• Edge Computing Power CPU, GPU, RAM
• Edge Neural Networks and Gen AI
• Unstructured Data Processing and Vector DB
Edge Hyper Enablers
17 | © Copyright Zilliz
17
• Vision to Images and Videos
• Audio from Cameras and Microphones
• Raw Text
• Edge Neural Networks and Gen AI
• Unstructured Data Processing and Vector DB
Edge Unstructured Data
18 | © Copyright Zilliz
18
03 Edge Devices
19 | © Copyright Zilliz
19
Demos
20 | © Copyright Zilliz
20
Demos
21 | © Copyright Zilliz
21
• NVIDIA Jetson Xavier NX
• NVIDIA Jetson AGX Orin
• Smart AI Cameras
• Raspberry Pi 5 with Hailo AI Acceleration
module
Edge Hardware Examples
22 | © Copyright Zilliz
22
• Cloud, Docker, Standalone or On-Premise Deployment: Can send
vectors and other fields to local, remote or Cloud Milvus.
• Instant Local Search: access local unstructured data for fast
search and local applications.
• Secure Local Data
• No Network Necessary: Especially for autonomous robots and
vehicles. Make instant local decisions.
• Local RAG and Super Charge Edge AI: enhance local image,
audio, video, text data with local LLMs. OLLAMA with RPI.
Generative AI
• Local Live Video
Why Even Use a Vector DB on the Edge?
23 | © Copyright Zilliz
23
Milvus Lite Locally
pip install pymilvus
24 | © Copyright Zilliz
24
Milvus Docker Locally or Edge Server
pip install pymilvus
25 | © Copyright Zilliz
25
Milvus Zilliz Cloud
pip install pymilvus
26 | © Copyright Zilliz
26
Python SDK Connect…
27 | © Copyright Zilliz
27
Some Other SDKs and Interfaces
Node.JS
Java
Golang
RESTful API
.NET C#
Apache Spark
Apache Kafka
Ruby
28 | © Copyright Zilliz
28
Edge AI  Edge Vector Database
Retrieval Augmented
Generation RAG
Run local LLM like OLLAMA
Image Similarity Search
Capture and search images at the
edge for no network, local
robotics, remote and secure.
Video Similarity Search
Search for similar videos, scenes,
or objects from local videos.
Audio Similarity Search
Find similar audios in local audio for
tasks like genre classification or
speech recognition for robotics and
sensing
Anomaly Detection
Detect data points, events, audio,
images and observations that
deviate significantly from the usual
pattern at the edge
Facial Recognition
For security applications
Customization
Robots
Benefits
Lower latency
Offline
Security
Localized storage
29 | © Copyright Zilliz
29
Edge Vectors to the Cloud
Framework
Hardware
Infrastructure
Embedding Models LLMs
Software Infrastructure
Vector Database
30 | © Copyright Zilliz
30
Milvus-Lite to the Cloud
● Milvus-Lite Dump/Export to Cloud Import
● Dual Ingest
● Switch to Cloud Only
● Kafka / Pulsar / MQTT
● Unstructured Data to MinIO, S3 or Cloud Object Storage
31 | © Copyright Zilliz
31
Milvus-Lite Export to JSON
milvus-lite dump -d XavierEdgeAI.db -p
/home/nvidia/nvme/AIMXavierEdgeAI/backup/ -c XavierEdgeAI
Dump collection XavierEdgeAI's data: 100%|████████████████|
33/33 00000000, 188.54it/s]
Dump collection XavierEdgeAI success
Dump collection XavierEdgeAI's data: 100%|████████████████|
33/33 00000000, 127.16it/s]
(milvusvenv)
https://github.com/milvus-io/milvus-lite
https://medium.com/@tspann/unstructured-data-processing-with-a-raspberry-pi-ai-kit-c959dd7fff47
Raspberry Pi AI Kit Hailo
Edge AI
https://medium.com/@tspann/edgeai-edge-vector-database-6a9b5238bffb
https://github.com/tspannhw/AIM-XavierEdgeAI
34 | © Copyright Zilliz
34 | © Copyright Zilliz
34
RESOURCES
35 | © Copyright Zilliz
35
Source Code
https://github.com/tspannhw/AIM-RPIAIKit-PoseEstimation
36 | © Copyright Zilliz
36
More Source Code
https://github.com/tspannhw/AIM-RPIAIKit
37 | © Copyright Zilliz
37
Even More Source Code
https://github.com/tspannhw/AIM-XavierEdgeAI
38 | © Copyright Zilliz
38
Vector Database Resources
Give Milvus a Star! Chat with me on Discord!
https://github.com/milvus-io/milvus
39 | © Copyright Zilliz
39
https://zilliz.com/learn/generative-ai
Extracting Value from Unstructured Data
Example
• A company has 100,000s+ pages of
proprietary documentation to enable
their staff to service customers.
Problem
• Searching can be slow, inefficient, or
lack context.
Solution
• Create internal chatbot with ChatGPT
and a vector database enriched with
company documentation to provide
direction and support to employees
and customers.
https://osschat.io/chat
We provide deployment flexibility for different
operational, security and compliance requirements
BRING YOUR OWN CLOUD
Zilliz BYOC
Enterprise-ready Milvus for
Private VPCs
Deploy in your virtual private cloud
Zilliz Cloud
Milvus Re-engineered for the
Cloud
Available on the leading public
clouds
FULLY MANAGED SERVICE
Coming Soon! Coming Soon!
Milvus
Most widely-adopted open
source vector database
Self hosted on any machine with
community support
SELF MANAGED SOFTWARE
Local Docker K8s
42 | © Copyright Zilliz
42
Well-connected in LLM infrastructure to enable RAG
use cases
Framework
Hardware
Infrastructure
Embedding Models LLMs
Software Infrastructure
Vector Database
43 | © Copyright 10/22/23 Zilliz
43 | © Copyright 10/22/23 Zilliz
milvus.io
github.com/milvus-io/
@milvusio
@paasDev
/in/timothyspann
Connect with me!
Thank you!
44 | © Copyright 10/22/23 Zilliz
44 | © Copyright 10/22/23 Zilliz
Join the
Milvus
Discord!
45 | © Copyright 10/22/23 Zilliz
45 | © Copyright 10/22/23 Zilliz
45 | © Copyright 10/22/23 Zilliz
45 | © Copyright 10/22/23 Zilliz
Milvus
Open Source Self-Managed
Zilliz Cloud
SaaS Fully-Managed
github.com/milvus-io/milvus
Getting Started with Vector Databases
zilliz.com/cloud
46
Unstructured Data Meetup
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics
such as vector databases, LLMs, and managing data at scale. The intended audience of this group
includes roles like machine learning engineers, data scientists, data engineers, software engineers, and
PMs.
This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
47 | © Copyright 10/22/23 Zilliz
47 | © Copyright 10/22/23 Zilliz
47
This week in Milvus, Towhee, Attu, GPT
Cache, Gen AI, LLM, Apache NiFi, Apache
Flink, Apache Kafka, ML, AI, Apache Spark,
Apache Iceberg, Python, Java, Vector DB
and Open Source friends.
https://bit.ly/32dAJft
https://github.com/milvus-io/milvus
AIM Weekly by Tim Spann
https://medium.com/@tspann/unstructured-street-data-in-new-york-8d3cde0a1e5b
https://medium.com/@tspann/not-every-field-is-just-text-numbers-or-vectors-976231e90e4d
https://medium.com/@tspann/shining-some-light-on-the-new-milvus-lite-5a0565eb5dd9
53 | © Copyright Zilliz
53
T H A N K Y O U

Unstructured Data Processing from Cloud to Edge Webinar

  • 1.
    1 | ©Copyright 2024 Zilliz 1 Unstructured Data Processing From Cloud to Edge Tim Spann @ Zilliz
  • 2.
    2 | ©Copyright 10/22/23 Zilliz 2 | © Copyright 10/22/23 Zilliz 2 | © Copyright 10/22/23 Zilliz 2 | © Copyright 10/22/23 Zilliz Tim Spann Principal Developer Advocate, Zilliz tim.spann@zilliz.com https://www.linkedin.com/in/timothyspann/ https://x.com/PaaSDev Speaker
  • 3.
    3 | ©Copyright 2024 Zilliz 3 01 Introduction CONTENTS 02 Edge AI Use Cases 03 Edge Devices
  • 4.
    4 | ©Copyright 2024 Zilliz 4
  • 5.
    5 | ©Copyright Zilliz 5 01 Introduction
  • 6.
    6 | ©Copyright 2024 Zilliz 6 ● Introduction to Unstructured Data Processing ● Introduction to Milvus ● Adding Milvus to Your Infrastructure ● AI Use Case Improvements ● Edge devices working with AI and vector databases
  • 7.
    7 | ©Copyright Zilliz 7 - Unstructured Data is 80% of data - Vector Databases are the only type of database that can work with unstructured data - Examples of Unstructured Data include text, images, videos, audio, etc Unstructured Data?
  • 8.
    8 | ©Copyright Zilliz 8 Why Even Use a Vector DB? Beyond High-Performance Search • CRUD Operations: Just like traditional databases, vector databases allow you to Create, Read, Update, and Delete data. • Data Freshness: Vector databases ensure your data remains up-to-date, reflecting the latest information for accurate searches. • Persistence: Your data is securely stored and persists even if the system restarts. • Availability: Your data is readily accessible for search and retrieval operations. • Scalability: Vector databases can handle growing data volumes efficiently.
  • 9.
    9 | ©Copyright Zilliz 9 Complete Data Management • Data Management: Vector databases provide tools to manage your data effectively, including data ingestion, indexing, and querying. • Backup and Migration: Create backups of your data for disaster recovery and easily migrate your data between different systems. Why Even Use a Vector DB?
  • 10.
    10 | ©Copyright Zilliz 10 Operational ease • Cloud or On-Premise Deployment: Vector databases can be deployed easily on various platforms, including cloud and on-premise environments. • Observability: Monitor the health and performance of your vector database to ensure optimal operation. • Multi-tenancy: Support multiple users or applications accessing the same database instance securely. Why Even Use a Vector DB?
  • 11.
    11 | ©Copyright Zilliz 11 Milvus Features Multi-Tenancy Hardware- Accelerated Compute Support Python, Java, Golang, NodeJS Milvus Lite, K8, Zilliz Cloud, Docker Scalable and Elastic Architecture Diverse Index Support Versatile Search Capabilities Tunable Consistency
  • 12.
    12 | ©Copyright Zilliz 12 Technologies for various types of Use cases Compute Types Designed for various compute powers, such as AVX512, Neon for SIMD, quantization cache-aware optimization and GPU Leverage strengths of each hardware type, ensuring high-speed processing and cost-effective scalability for different application needs Search Types Support multiple types such as top-K ANN, Range ANN, sparse & dense, multi-vector, grouping, and metadata filtering Enable query flexibility and accuracy, allowing developers to tailor their information retrieval needs Multi-tenancy Enable multi-tenancy through collection and partition management Allow for efficient resource utilization and customizable data segregation, ensuring secure and isolated data handling for each tenant Index Types Offer a wide range of 15 indexes support, including popular ones like Hierarchical Navigable Small Worlds HNSW, PQ, Binary, Sparse, DiskANN and GPU index Empower developers with tailored search optimizations, catering to performance, accuracy and cost needs
  • 13.
    13 | ©Copyright Zilliz 13 02 Edge AI Use Cases
  • 14.
    14 | ©Copyright Zilliz 14 • Robots • Smart Cities • Smart Factories • Autonomous Cars • Automated Retail • Smart Home Edge AI Use Cases
  • 15.
    15 | ©Copyright Zilliz 15 • Proprietary Document Search • On-Device Object Detection • Milvus Lite on Device Local Search on Edge Devices
  • 16.
    16 | ©Copyright Zilliz 16 • 5G and Everywhere Networks • IoT with Cheap Plentiful Sensors • Edge Computing Power CPU, GPU, RAM • Edge Neural Networks and Gen AI • Unstructured Data Processing and Vector DB Edge Hyper Enablers
  • 17.
    17 | ©Copyright Zilliz 17 • Vision to Images and Videos • Audio from Cameras and Microphones • Raw Text • Edge Neural Networks and Gen AI • Unstructured Data Processing and Vector DB Edge Unstructured Data
  • 18.
    18 | ©Copyright Zilliz 18 03 Edge Devices
  • 19.
    19 | ©Copyright Zilliz 19 Demos
  • 20.
    20 | ©Copyright Zilliz 20 Demos
  • 21.
    21 | ©Copyright Zilliz 21 • NVIDIA Jetson Xavier NX • NVIDIA Jetson AGX Orin • Smart AI Cameras • Raspberry Pi 5 with Hailo AI Acceleration module Edge Hardware Examples
  • 22.
    22 | ©Copyright Zilliz 22 • Cloud, Docker, Standalone or On-Premise Deployment: Can send vectors and other fields to local, remote or Cloud Milvus. • Instant Local Search: access local unstructured data for fast search and local applications. • Secure Local Data • No Network Necessary: Especially for autonomous robots and vehicles. Make instant local decisions. • Local RAG and Super Charge Edge AI: enhance local image, audio, video, text data with local LLMs. OLLAMA with RPI. Generative AI • Local Live Video Why Even Use a Vector DB on the Edge?
  • 23.
    23 | ©Copyright Zilliz 23 Milvus Lite Locally pip install pymilvus
  • 24.
    24 | ©Copyright Zilliz 24 Milvus Docker Locally or Edge Server pip install pymilvus
  • 25.
    25 | ©Copyright Zilliz 25 Milvus Zilliz Cloud pip install pymilvus
  • 26.
    26 | ©Copyright Zilliz 26 Python SDK Connect…
  • 27.
    27 | ©Copyright Zilliz 27 Some Other SDKs and Interfaces Node.JS Java Golang RESTful API .NET C# Apache Spark Apache Kafka Ruby
  • 28.
    28 | ©Copyright Zilliz 28 Edge AI  Edge Vector Database Retrieval Augmented Generation RAG Run local LLM like OLLAMA Image Similarity Search Capture and search images at the edge for no network, local robotics, remote and secure. Video Similarity Search Search for similar videos, scenes, or objects from local videos. Audio Similarity Search Find similar audios in local audio for tasks like genre classification or speech recognition for robotics and sensing Anomaly Detection Detect data points, events, audio, images and observations that deviate significantly from the usual pattern at the edge Facial Recognition For security applications Customization Robots Benefits Lower latency Offline Security Localized storage
  • 29.
    29 | ©Copyright Zilliz 29 Edge Vectors to the Cloud Framework Hardware Infrastructure Embedding Models LLMs Software Infrastructure Vector Database
  • 30.
    30 | ©Copyright Zilliz 30 Milvus-Lite to the Cloud ● Milvus-Lite Dump/Export to Cloud Import ● Dual Ingest ● Switch to Cloud Only ● Kafka / Pulsar / MQTT ● Unstructured Data to MinIO, S3 or Cloud Object Storage
  • 31.
    31 | ©Copyright Zilliz 31 Milvus-Lite Export to JSON milvus-lite dump -d XavierEdgeAI.db -p /home/nvidia/nvme/AIMXavierEdgeAI/backup/ -c XavierEdgeAI Dump collection XavierEdgeAI's data: 100%|████████████████| 33/33 00000000, 188.54it/s] Dump collection XavierEdgeAI success Dump collection XavierEdgeAI's data: 100%|████████████████| 33/33 00000000, 127.16it/s] (milvusvenv) https://github.com/milvus-io/milvus-lite
  • 32.
  • 33.
  • 34.
    34 | ©Copyright Zilliz 34 | © Copyright Zilliz 34 RESOURCES
  • 35.
    35 | ©Copyright Zilliz 35 Source Code https://github.com/tspannhw/AIM-RPIAIKit-PoseEstimation
  • 36.
    36 | ©Copyright Zilliz 36 More Source Code https://github.com/tspannhw/AIM-RPIAIKit
  • 37.
    37 | ©Copyright Zilliz 37 Even More Source Code https://github.com/tspannhw/AIM-XavierEdgeAI
  • 38.
    38 | ©Copyright Zilliz 38 Vector Database Resources Give Milvus a Star! Chat with me on Discord! https://github.com/milvus-io/milvus
  • 39.
    39 | ©Copyright Zilliz 39 https://zilliz.com/learn/generative-ai
  • 40.
    Extracting Value fromUnstructured Data Example • A company has 100,000s+ pages of proprietary documentation to enable their staff to service customers. Problem • Searching can be slow, inefficient, or lack context. Solution • Create internal chatbot with ChatGPT and a vector database enriched with company documentation to provide direction and support to employees and customers. https://osschat.io/chat
  • 41.
    We provide deploymentflexibility for different operational, security and compliance requirements BRING YOUR OWN CLOUD Zilliz BYOC Enterprise-ready Milvus for Private VPCs Deploy in your virtual private cloud Zilliz Cloud Milvus Re-engineered for the Cloud Available on the leading public clouds FULLY MANAGED SERVICE Coming Soon! Coming Soon! Milvus Most widely-adopted open source vector database Self hosted on any machine with community support SELF MANAGED SOFTWARE Local Docker K8s
  • 42.
    42 | ©Copyright Zilliz 42 Well-connected in LLM infrastructure to enable RAG use cases Framework Hardware Infrastructure Embedding Models LLMs Software Infrastructure Vector Database
  • 43.
    43 | ©Copyright 10/22/23 Zilliz 43 | © Copyright 10/22/23 Zilliz milvus.io github.com/milvus-io/ @milvusio @paasDev /in/timothyspann Connect with me! Thank you!
  • 44.
    44 | ©Copyright 10/22/23 Zilliz 44 | © Copyright 10/22/23 Zilliz Join the Milvus Discord!
  • 45.
    45 | ©Copyright 10/22/23 Zilliz 45 | © Copyright 10/22/23 Zilliz 45 | © Copyright 10/22/23 Zilliz 45 | © Copyright 10/22/23 Zilliz Milvus Open Source Self-Managed Zilliz Cloud SaaS Fully-Managed github.com/milvus-io/milvus Getting Started with Vector Databases zilliz.com/cloud
  • 46.
    46 Unstructured Data Meetup https://www.meetup.com/unstructured-data-meetup-new-york/ Thismeetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs. This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
  • 47.
    47 | ©Copyright 10/22/23 Zilliz 47 | © Copyright 10/22/23 Zilliz 47 This week in Milvus, Towhee, Attu, GPT Cache, Gen AI, LLM, Apache NiFi, Apache Flink, Apache Kafka, ML, AI, Apache Spark, Apache Iceberg, Python, Java, Vector DB and Open Source friends. https://bit.ly/32dAJft https://github.com/milvus-io/milvus AIM Weekly by Tim Spann
  • 49.
  • 50.
  • 51.
  • 53.
    53 | ©Copyright Zilliz 53 T H A N K Y O U