SlideShare a Scribd company logo
Luigi Fugaro
Senior Solution Architect @ Redis
Unlocking the Future of Data:
Powering Next-Gen AI
with Vector Databases
Agenda
1. Data Review
2. Vector Embeddings
3. Vector Database
4. Demo - Let’s see come code
Titolo
Data Review
1 of 4
Data Review
Let’s start with a metric
Around 80%
of the data generated
by organizations is
Unstructured
Growth
IDC Report 2023 - https://www.box.com/resources/unstructured-data-paper
Data Review
Data Types
Growth
Unstructured
Quasi-Structured
Semi-Structured
Structured
No inherent structure
~ PDFs, images, audio, video
Erratic patterns/formats
~ Clickstreams
There's a discernible pattern
~ Spreadsheets / XML / JSON
Schema/defined data model
~ Database
IDC Report 2023 - https://www.box.com/resources/unstructured-data-paper
How to deal with Unstructured Data?
Common approaches were:
● Labeling
● Tagging
Data Review
Labeling and Tagging
Feature Value
Frame Color Green
Tire Color Brown
Has Rear Rack Yes
Has Fenders Yes
Has Safety Bell No
Has Fat Tires Yes
Feature Value
Frame Color Matte Olive
Tire Color Orange
Has Rear Rack Yes
Has Fenders Yes
Has Safety Bell Yes
Has Fat Tires Yes
Data Review
Labeling and Tagging
Feature Value
Easy Assembly ⭐⭐⭐⭐⭐
Chain Quality ⭐⭐⭐
Seat Comfort ⭐
Gear Smoothness ⭐⭐⭐⭐
Data Review
How to deal with Unstructured Data?
Labeling and Tagging are
labor intensive,
subjective and error-prone
What’s the new approach?
Data Review
Titolo
Vector Embeddings
2 of 4
Vector Embeddings
What is a Vector?
Numeric representation of something
in N-dimensional space using floating numbers
Can represent anything
entire documents, images, video, audio…
Vector Embeddings
How to turn Data into Vectors?
It’s quite a complex process,
based primarily on Neural Networks
Vector Embeddings
How to turn Data into Vectors?
Don’t be scared, Machine Learning and Deep Learning
has leaped forward in the last decade and we all can
benefit from a huge ecosystem of Models, ready to use!
Each Model has its own specific task!
Vector Embeddings
Music
Video
Images
Faces
Poses
Emotions
Audio Model
Video Model
Vision Model
Face Detection/Recognition Models
Vision Model Trained on Poses
Sentiment Model Embeddings
Models quantifies features of the item
Vector Embeddings
Why vectors embeddings?
They are comparable!
Visual representation
Vector Embeddings
Semantic Relationship Syntactic Relationship
Visual representation
Vector Embeddings
https://jalammar.github.io/illustrated-word2vec
“King”
[ 0.50451 , 0.68607 , -0.59517 , -0.022801, 0.60046 , -0.13498 , -0.08813 , 0.47377 , -0.61798 , -0.31012
, -0.076666, 1.493 , -0.034189, -0.98173 , 0.68229 , 0.81722 , -0.51874 , -0.31503 , -0.55809 , 0.66421 ,
0.1961 , -0.13495 , -0.11476 , -0.30344 , 0.41177 , -2.223 , -1.0756 , -1.0783 , -0.34354 , 0.33505 , 1.9927 ,
-0.04234 , -0.64319 , 0.71125 , 0.49159 , 0.16754 , 0.34344 , -0.25663 , -0.8523 , 0.1661 , 0.40102 , 1.1685
, -1.0137 , -0.21585 , -0.15155 , 0.78321 , -0.91241 , -1.6106 , -0.64426 , -0.51042 ]
Visual representation
Vector Embeddings
https://jalammar.github.io/illustrated-word2vec
Visual representation
Vector Embeddings
https://jalammar.github.io/illustrated-word2vec
Visual representation
Vector Embeddings
https://jalammar.github.io/illustrated-word2vec
Visual representation
Vector Embeddings
https://jalammar.github.io/illustrated-word2vec
Visual representation
Vector Embeddings
https://jalammar.github.io/illustrated-word2vec
So, is it all about arithmetic operations?
Vector Embeddings
What else?
There is one main operation that you can do,
and it’s called Similarity Search!
Vector Similarity Search Algorithms
Vector Embeddings
Vector Embeddings
Cosine Similarity
Now that we have Vector Embeddings?
Vector Embeddings
We need a database to store them!
Nope, we need a Vector Database!
Titolo
Vector Database
3 of 4
Vector Database
Music
Video
Images
Faces
Poses
Emotions
Audio Model
Video Model
Vision Model
Face Detection/Recognition Models
Vision Model Trained on Poses
Sentiment Model Embeddings
REDIS
How does a Vector DB need to have?
❏ Store data
❏ Index data
❏ Query data
Does Redis have all of’em?
Avoja, and much more!
Vector Database
Vector indexing algorithms
Redis manages vectors in an index data structure to enable intelligent similarity search that
balances search speed and search quality. Choose from two popular techniques, FLAT (a brute
force approach) and HNSW (Hierarchical Navigable Small World - a faster, and approximate
approach).
Vector search distance metrics
Redis uses a distance metric to measure the similarity between two vectors. Choose from
three popular metrics – Euclidean, Inner Product, and Cosine Similarity – used to calculate
how “close” or “far apart” two vectors are.
Powerful hybrid filtering
Take advantage of the full suite of search features available in Redis query and search.
Enhance your workflows by combining the power of vector similarity with more traditional
geo, numeric, text, and tag filters. Incorporate more business logic into queries and simplify
client application code.
Redis as Vector DB
Vector Database
Redis as Vector DB
Real-time updates
Real-time search and recommendation systems generate large volumes of
changing data. New images, text, products, or metadata? Perform updates,
insertions, and deletes to the search index seamlessly as your dataset changes
overtime. Redis Enterprise reduces costly impacts of stagnant data.
Vector range queries
Traditional vector search is performed by finding the “top K” most similar
vectors. Redis Enterprise also enables the discovery of relevant content within a
predefined similarity range or threshold for an alternative, and offers a more
flexible search experience.
Vector Database
Titolo
Let’s see some code
4 of 4
Demo - Plan B!
spring.data.redis.host
=35.187.74.111
spring.data.redis.port
=12000
spring.data.redis.username
=default
spring.data.redis.password
=redis
server.port=8080
spring.mvc.hiddenmethod.filter.enabled
=true
com.redis.om.vss.useLocalImages
=false
com.redis.om.vss.maxLines
=300
redis.om.spring.djl.enabled
=true
redis.om.spring.djl.image-embedding-model-engine
=PyTorch
redis.om.spring.djl.image-embedding-model-model-urls
=djl://ai.djl.pytorch/resnet18_embedding
redis.om.spring.djl.sentence-tokenizer-max-length
=768
redis.om.spring.djl.sentence-tokenizer-model
=sentence-transformers/all-mpnet-base-v2
redis.om.spring.djl.sentence-tokenizer-model-max-length
=768
redis.om.spring.djl.face-detection-model-engine
=PyTorch
redis.om.spring.djl.face-detection-model-name
=retinaface
redis.om.spring.djl.face-detection-model-model-urls
=https://resources.djl.ai/test-models/pytorch/retinaface.zip
redis.om.spring.djl.face-embedding-model-engine
=PyTorch
redis.om.spring.djl.face-embedding-model-name
=face_feature
redis.om.spring.djl.face-embedding-model-model-urls
=https://resources.djl.ai/test-models/pytorch/face_feature.zip
Demo - Plan B!
@Document
public class ImageData {
@Id
private String id;
@Indexed
private String name;
@Indexed
private int height;
@Indexed
private int width;
@Indexed(schemaFieldType = SchemaFieldType.VECTOR,
algorithm = VectorField.VectorAlgorithm.HNSW,
type = VectorType.FLOAT32,
dimension = 512,
distanceMetric = DistanceMetric.L2,
initialCapacity = 10)
private float[] imageEmbedding
;
@Vectorize(destination = "imageEmbedding", embeddingType = EmbeddingType.FACE)
private String imagePath;
@Indexed
private double score = 0;
...
}
Demo - Plan B!
@Service
public class BestOfMatchService {
@Autowired
private EntityStream entityStream;
@Autowired
public ZooModel <Image, float[]> faceEmbeddingModel ;
private List<ImageData > matchAll (byte[] image, int limit) {
List<ImageData > imageDataList = new ArrayList<>();
try (Predictor <Image, float[]> predictor = faceEmbeddingModel .newPredictor()) {
ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream( image);
Image img = ImageFactory .getInstance().fromInputStream( byteArrayInputStream );
float[] embedding = predictor .predict( img);
byte[] embeddingAsByteArray = floatArrayToByteArray(embedding );
SearchStream<ImageData> stream = entityStream.of(ImageData.class);
List<Pair<ImageData,Double>> matchWithScore = stream
.filter(ImageData$.IMAGE_EMBEDDING.knn(K, embeddingAsByteArray))
.sorted(ImageData$._IMAGE_EMBEDDING_SCORE, SortedField.SortOrder.ASC)
.limit(limit)
.map(Fields.of(ImageData$._THIS, ImageData$._IMAGE_EMBEDDING_SCORE))
.collect(Collectors.toList());
for (Pair<ImageData ,Double> pair : matchWithScore ) {
ImageData imageData = pair.getFirst();
Double score = pair.getSecond();
imageData .setScore( score);
imageDataList .add(imageData );
}
return imageDataList ;
} catch (Exception e ) {
throw new RuntimeException( e);
}
}
}
Demo - Plan B!
Demo - Plan B!
Demo - Plan B!
Demo - Plan B!
Titolo
Wrap-up
1,2,3,4
4
Wrap up
Unlocking the Future of Data:
Powering Next-Gen AI with Vector Databases
#WMF2024
3
2
1
Data Vector Embeddings Vector Database Redis
VOTA L’INTERVENTO SU IBRIDA
Luigi Fugaro
Senior Solution Architect @ Redis
TITOLO PASSAGGIO UNO
Per ulteriori informazioni puoi scriverci a
speaker@wemakefuture.it
www.wemakefuture.it

More Related Content

Similar to WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Databases

Unleashing the Power of Vector Search in .NET - SharpCoding2024.pdf
Unleashing the Power of Vector Search in .NET - SharpCoding2024.pdfUnleashing the Power of Vector Search in .NET - SharpCoding2024.pdf
Unleashing the Power of Vector Search in .NET - SharpCoding2024.pdf
Luigi Fugaro
 
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Karen Thompson
 
AI at Scale in Enterprises
AI at Scale in Enterprises AI at Scale in Enterprises
AI at Scale in Enterprises
Ganesan Narayanasamy
 
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham ALSecrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Mark Tabladillo
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Mark Tabladillo
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Denodo
 
Record matching over query results
Record matching over query resultsRecord matching over query results
Record matching over query results
ambitlick
 
Accelerating Path to Production for Generative AI-powered Applications
Accelerating Path to Production for Generative AI-powered ApplicationsAccelerating Path to Production for Generative AI-powered Applications
Accelerating Path to Production for Generative AI-powered Applications
HostedbyConfluent
 
Intro to big data and applications -day 3
Intro to big data and applications -day 3Intro to big data and applications -day 3
Intro to big data and applications -day 3
Parviz Vakili
 
Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data LakeFishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake
ArangoDB Database
 
What do you need to know before going in to Sri Lankan IT industry
What do you need to know before going in to Sri Lankan IT industryWhat do you need to know before going in to Sri Lankan IT industry
What do you need to know before going in to Sri Lankan IT industry
Andun Sameera
 
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - ExcercisesAgile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Raphael Branger
 
Security, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationSecurity, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software Integration
DataWorks Summit
 
ER/Studio Data Architect Datasheet
ER/Studio Data Architect DatasheetER/Studio Data Architect Datasheet
ER/Studio Data Architect Datasheet
Embarcadero Technologies
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
Valdas Maksimavičius
 
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
Naoki (Neo) SATO
 
Guidelines DataCite Denmark 2014
Guidelines DataCite Denmark 2014Guidelines DataCite Denmark 2014
Guidelines DataCite Denmark 2014
DTU Library
 
Denodo Datafest 2017 London Tekin Mentes Logitech
Denodo Datafest 2017 London Tekin Mentes LogitechDenodo Datafest 2017 London Tekin Mentes Logitech
Denodo Datafest 2017 London Tekin Mentes Logitech
Tekin Mentes
 
Data vault what's Next: Part 2
Data vault what's Next: Part 2Data vault what's Next: Part 2
Data vault what's Next: Part 2
Empowered Holdings, LLC
 
Big Data in Azure
Big Data in AzureBig Data in Azure

Similar to WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Databases (20)

Unleashing the Power of Vector Search in .NET - SharpCoding2024.pdf
Unleashing the Power of Vector Search in .NET - SharpCoding2024.pdfUnleashing the Power of Vector Search in .NET - SharpCoding2024.pdf
Unleashing the Power of Vector Search in .NET - SharpCoding2024.pdf
 
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
 
AI at Scale in Enterprises
AI at Scale in Enterprises AI at Scale in Enterprises
AI at Scale in Enterprises
 
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham ALSecrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 
Record matching over query results
Record matching over query resultsRecord matching over query results
Record matching over query results
 
Accelerating Path to Production for Generative AI-powered Applications
Accelerating Path to Production for Generative AI-powered ApplicationsAccelerating Path to Production for Generative AI-powered Applications
Accelerating Path to Production for Generative AI-powered Applications
 
Intro to big data and applications -day 3
Intro to big data and applications -day 3Intro to big data and applications -day 3
Intro to big data and applications -day 3
 
Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data LakeFishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake
 
What do you need to know before going in to Sri Lankan IT industry
What do you need to know before going in to Sri Lankan IT industryWhat do you need to know before going in to Sri Lankan IT industry
What do you need to know before going in to Sri Lankan IT industry
 
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - ExcercisesAgile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
 
Security, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationSecurity, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software Integration
 
ER/Studio Data Architect Datasheet
ER/Studio Data Architect DatasheetER/Studio Data Architect Datasheet
ER/Studio Data Architect Datasheet
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
 
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
 
Guidelines DataCite Denmark 2014
Guidelines DataCite Denmark 2014Guidelines DataCite Denmark 2014
Guidelines DataCite Denmark 2014
 
Denodo Datafest 2017 London Tekin Mentes Logitech
Denodo Datafest 2017 London Tekin Mentes LogitechDenodo Datafest 2017 London Tekin Mentes Logitech
Denodo Datafest 2017 London Tekin Mentes Logitech
 
Data vault what's Next: Part 2
Data vault what's Next: Part 2Data vault what's Next: Part 2
Data vault what's Next: Part 2
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 

More from Luigi Fugaro

Ottimizzare le performance dell'API Server K8s come utilizzare cache e eventi...
Ottimizzare le performance dell'API Server K8s come utilizzare cache e eventi...Ottimizzare le performance dell'API Server K8s come utilizzare cache e eventi...
Ottimizzare le performance dell'API Server K8s come utilizzare cache e eventi...
Luigi Fugaro
 
Sharp Coding 2023 - Luigi Fugaro - ACRE.pdf
Sharp Coding 2023 - Luigi Fugaro - ACRE.pdfSharp Coding 2023 - Luigi Fugaro - ACRE.pdf
Sharp Coding 2023 - Luigi Fugaro - ACRE.pdf
Luigi Fugaro
 
Red Hat Summit Connect 2023 - Redis Enterprise, the engine of Generative AI
Red Hat Summit Connect 2023 - Redis Enterprise, the engine of Generative AIRed Hat Summit Connect 2023 - Redis Enterprise, the engine of Generative AI
Red Hat Summit Connect 2023 - Redis Enterprise, the engine of Generative AI
Luigi Fugaro
 
Caching Patterns for lazy devs for lazy loading - Luigi Fugaro VDTJAN23
Caching Patterns for lazy devs for lazy loading - Luigi Fugaro VDTJAN23Caching Patterns for lazy devs for lazy loading - Luigi Fugaro VDTJAN23
Caching Patterns for lazy devs for lazy loading - Luigi Fugaro VDTJAN23
Luigi Fugaro
 
Codemotion Milan '22 - Real Time Data - No CRDTs, no party!
Codemotion Milan '22 - Real Time Data - No CRDTs, no party!Codemotion Milan '22 - Real Time Data - No CRDTs, no party!
Codemotion Milan '22 - Real Time Data - No CRDTs, no party!
Luigi Fugaro
 
OpenSlava 2018 - Cloud Native Applications with OpenShift
OpenSlava 2018 - Cloud Native Applications with OpenShiftOpenSlava 2018 - Cloud Native Applications with OpenShift
OpenSlava 2018 - Cloud Native Applications with OpenShift
Luigi Fugaro
 
Redis - Non solo cache
Redis - Non solo cacheRedis - Non solo cache
Redis - Non solo cache
Luigi Fugaro
 
JDV for Codemotion Rome 2017
JDV for Codemotion Rome 2017JDV for Codemotion Rome 2017
JDV for Codemotion Rome 2017
Luigi Fugaro
 
2.5tier Javaday (italian)
2.5tier Javaday (italian)2.5tier Javaday (italian)
2.5tier Javaday (italian)
Luigi Fugaro
 

More from Luigi Fugaro (9)

Ottimizzare le performance dell'API Server K8s come utilizzare cache e eventi...
Ottimizzare le performance dell'API Server K8s come utilizzare cache e eventi...Ottimizzare le performance dell'API Server K8s come utilizzare cache e eventi...
Ottimizzare le performance dell'API Server K8s come utilizzare cache e eventi...
 
Sharp Coding 2023 - Luigi Fugaro - ACRE.pdf
Sharp Coding 2023 - Luigi Fugaro - ACRE.pdfSharp Coding 2023 - Luigi Fugaro - ACRE.pdf
Sharp Coding 2023 - Luigi Fugaro - ACRE.pdf
 
Red Hat Summit Connect 2023 - Redis Enterprise, the engine of Generative AI
Red Hat Summit Connect 2023 - Redis Enterprise, the engine of Generative AIRed Hat Summit Connect 2023 - Redis Enterprise, the engine of Generative AI
Red Hat Summit Connect 2023 - Redis Enterprise, the engine of Generative AI
 
Caching Patterns for lazy devs for lazy loading - Luigi Fugaro VDTJAN23
Caching Patterns for lazy devs for lazy loading - Luigi Fugaro VDTJAN23Caching Patterns for lazy devs for lazy loading - Luigi Fugaro VDTJAN23
Caching Patterns for lazy devs for lazy loading - Luigi Fugaro VDTJAN23
 
Codemotion Milan '22 - Real Time Data - No CRDTs, no party!
Codemotion Milan '22 - Real Time Data - No CRDTs, no party!Codemotion Milan '22 - Real Time Data - No CRDTs, no party!
Codemotion Milan '22 - Real Time Data - No CRDTs, no party!
 
OpenSlava 2018 - Cloud Native Applications with OpenShift
OpenSlava 2018 - Cloud Native Applications with OpenShiftOpenSlava 2018 - Cloud Native Applications with OpenShift
OpenSlava 2018 - Cloud Native Applications with OpenShift
 
Redis - Non solo cache
Redis - Non solo cacheRedis - Non solo cache
Redis - Non solo cache
 
JDV for Codemotion Rome 2017
JDV for Codemotion Rome 2017JDV for Codemotion Rome 2017
JDV for Codemotion Rome 2017
 
2.5tier Javaday (italian)
2.5tier Javaday (italian)2.5tier Javaday (italian)
2.5tier Javaday (italian)
 

Recently uploaded

03. Ruby Variables & Regex - Ruby Core Teaching
03. Ruby Variables & Regex - Ruby Core Teaching03. Ruby Variables & Regex - Ruby Core Teaching
03. Ruby Variables & Regex - Ruby Core Teaching
quanhoangd129
 
Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...
Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...
Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...
rachitkumar09887
 
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
norina2645
 
當測試開始左移
當測試開始左移當測試開始左移
當測試開始左移
Jersey (CHE-PING) Su
 
Private Girls Call Navi Mumbai 🛵🚡9820252231 💃 Choose Best And Top Girl Servic...
Private Girls Call Navi Mumbai 🛵🚡9820252231 💃 Choose Best And Top Girl Servic...Private Girls Call Navi Mumbai 🛵🚡9820252231 💃 Choose Best And Top Girl Servic...
Private Girls Call Navi Mumbai 🛵🚡9820252231 💃 Choose Best And Top Girl Servic...
902basic
 
GT degree offer diploma Transcript
GT degree offer diploma TranscriptGT degree offer diploma Transcript
GT degree offer diploma Transcript
attueb
 
Russian Girls Call Mumbai 🛵🚡9833363713 💃 Choose Best And Top Girl Service And...
Russian Girls Call Mumbai 🛵🚡9833363713 💃 Choose Best And Top Girl Service And...Russian Girls Call Mumbai 🛵🚡9833363713 💃 Choose Best And Top Girl Service And...
Russian Girls Call Mumbai 🛵🚡9833363713 💃 Choose Best And Top Girl Service And...
dream girl
 
Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...
Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...
Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...
shanihomely
 
Maximizing Efficiency and Profitability: Optimizing Data Systems, Enhancing C...
Maximizing Efficiency and Profitability: Optimizing Data Systems, Enhancing C...Maximizing Efficiency and Profitability: Optimizing Data Systems, Enhancing C...
Maximizing Efficiency and Profitability: Optimizing Data Systems, Enhancing C...
OnePlan Solutions
 
Authentication Review-June -2024 AP & TS.pptx
Authentication Review-June -2024 AP & TS.pptxAuthentication Review-June -2024 AP & TS.pptx
Authentication Review-June -2024 AP & TS.pptx
DEMONDUOS
 
B.Sc. Computer Science Department PPT 2024
B.Sc. Computer Science Department PPT 2024B.Sc. Computer Science Department PPT 2024
B.Sc. Computer Science Department PPT 2024
vmsdeptcom
 
Applitools Autonomous 2.0 Sneak Peek.pdf
Applitools Autonomous 2.0 Sneak Peek.pdfApplitools Autonomous 2.0 Sneak Peek.pdf
Applitools Autonomous 2.0 Sneak Peek.pdf
Applitools
 
PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)
PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)
PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)
andrehoraa
 
08. Ruby Enumerable - Ruby Core Teaching
08. Ruby Enumerable - Ruby Core Teaching08. Ruby Enumerable - Ruby Core Teaching
08. Ruby Enumerable - Ruby Core Teaching
quanhoangd129
 
Amadeus Travel API, Amadeus Booking API, Amadeus GDS
Amadeus Travel API, Amadeus Booking API, Amadeus GDSAmadeus Travel API, Amadeus Booking API, Amadeus GDS
Amadeus Travel API, Amadeus Booking API, Amadeus GDS
aadhiyaeliza
 
A Step-by-Step Guide to Selecting the Right Automated Software Testing Tools.pdf
A Step-by-Step Guide to Selecting the Right Automated Software Testing Tools.pdfA Step-by-Step Guide to Selecting the Right Automated Software Testing Tools.pdf
A Step-by-Step Guide to Selecting the Right Automated Software Testing Tools.pdf
kalichargn70th171
 
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
neshakor5152
 
InflectraCON 360: Risk-Based Testing for Mission Critical Systems
InflectraCON 360: Risk-Based Testing for Mission Critical SystemsInflectraCON 360: Risk-Based Testing for Mission Critical Systems
InflectraCON 360: Risk-Based Testing for Mission Critical Systems
Inflectra
 
02. Ruby Basic slides - Ruby Core Teaching
02. Ruby Basic slides - Ruby Core Teaching02. Ruby Basic slides - Ruby Core Teaching
02. Ruby Basic slides - Ruby Core Teaching
quanhoangd129
 
Empowering Businesses with Intelligent Software Solutions - Grawlix
Empowering Businesses with Intelligent Software Solutions - GrawlixEmpowering Businesses with Intelligent Software Solutions - Grawlix
Empowering Businesses with Intelligent Software Solutions - Grawlix
Aarisha Shaikh
 

Recently uploaded (20)

03. Ruby Variables & Regex - Ruby Core Teaching
03. Ruby Variables & Regex - Ruby Core Teaching03. Ruby Variables & Regex - Ruby Core Teaching
03. Ruby Variables & Regex - Ruby Core Teaching
 
Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...
Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...
Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...
 
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
 
當測試開始左移
當測試開始左移當測試開始左移
當測試開始左移
 
Private Girls Call Navi Mumbai 🛵🚡9820252231 💃 Choose Best And Top Girl Servic...
Private Girls Call Navi Mumbai 🛵🚡9820252231 💃 Choose Best And Top Girl Servic...Private Girls Call Navi Mumbai 🛵🚡9820252231 💃 Choose Best And Top Girl Servic...
Private Girls Call Navi Mumbai 🛵🚡9820252231 💃 Choose Best And Top Girl Servic...
 
GT degree offer diploma Transcript
GT degree offer diploma TranscriptGT degree offer diploma Transcript
GT degree offer diploma Transcript
 
Russian Girls Call Mumbai 🛵🚡9833363713 💃 Choose Best And Top Girl Service And...
Russian Girls Call Mumbai 🛵🚡9833363713 💃 Choose Best And Top Girl Service And...Russian Girls Call Mumbai 🛵🚡9833363713 💃 Choose Best And Top Girl Service And...
Russian Girls Call Mumbai 🛵🚡9833363713 💃 Choose Best And Top Girl Service And...
 
Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...
Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...
Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...
 
Maximizing Efficiency and Profitability: Optimizing Data Systems, Enhancing C...
Maximizing Efficiency and Profitability: Optimizing Data Systems, Enhancing C...Maximizing Efficiency and Profitability: Optimizing Data Systems, Enhancing C...
Maximizing Efficiency and Profitability: Optimizing Data Systems, Enhancing C...
 
Authentication Review-June -2024 AP & TS.pptx
Authentication Review-June -2024 AP & TS.pptxAuthentication Review-June -2024 AP & TS.pptx
Authentication Review-June -2024 AP & TS.pptx
 
B.Sc. Computer Science Department PPT 2024
B.Sc. Computer Science Department PPT 2024B.Sc. Computer Science Department PPT 2024
B.Sc. Computer Science Department PPT 2024
 
Applitools Autonomous 2.0 Sneak Peek.pdf
Applitools Autonomous 2.0 Sneak Peek.pdfApplitools Autonomous 2.0 Sneak Peek.pdf
Applitools Autonomous 2.0 Sneak Peek.pdf
 
PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)
PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)
PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)
 
08. Ruby Enumerable - Ruby Core Teaching
08. Ruby Enumerable - Ruby Core Teaching08. Ruby Enumerable - Ruby Core Teaching
08. Ruby Enumerable - Ruby Core Teaching
 
Amadeus Travel API, Amadeus Booking API, Amadeus GDS
Amadeus Travel API, Amadeus Booking API, Amadeus GDSAmadeus Travel API, Amadeus Booking API, Amadeus GDS
Amadeus Travel API, Amadeus Booking API, Amadeus GDS
 
A Step-by-Step Guide to Selecting the Right Automated Software Testing Tools.pdf
A Step-by-Step Guide to Selecting the Right Automated Software Testing Tools.pdfA Step-by-Step Guide to Selecting the Right Automated Software Testing Tools.pdf
A Step-by-Step Guide to Selecting the Right Automated Software Testing Tools.pdf
 
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
 
InflectraCON 360: Risk-Based Testing for Mission Critical Systems
InflectraCON 360: Risk-Based Testing for Mission Critical SystemsInflectraCON 360: Risk-Based Testing for Mission Critical Systems
InflectraCON 360: Risk-Based Testing for Mission Critical Systems
 
02. Ruby Basic slides - Ruby Core Teaching
02. Ruby Basic slides - Ruby Core Teaching02. Ruby Basic slides - Ruby Core Teaching
02. Ruby Basic slides - Ruby Core Teaching
 
Empowering Businesses with Intelligent Software Solutions - Grawlix
Empowering Businesses with Intelligent Software Solutions - GrawlixEmpowering Businesses with Intelligent Software Solutions - Grawlix
Empowering Businesses with Intelligent Software Solutions - Grawlix
 

WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Databases

  • 1. Luigi Fugaro Senior Solution Architect @ Redis Unlocking the Future of Data: Powering Next-Gen AI with Vector Databases
  • 2. Agenda 1. Data Review 2. Vector Embeddings 3. Vector Database 4. Demo - Let’s see come code
  • 4. Data Review Let’s start with a metric Around 80% of the data generated by organizations is Unstructured Growth IDC Report 2023 - https://www.box.com/resources/unstructured-data-paper
  • 5. Data Review Data Types Growth Unstructured Quasi-Structured Semi-Structured Structured No inherent structure ~ PDFs, images, audio, video Erratic patterns/formats ~ Clickstreams There's a discernible pattern ~ Spreadsheets / XML / JSON Schema/defined data model ~ Database IDC Report 2023 - https://www.box.com/resources/unstructured-data-paper
  • 6. How to deal with Unstructured Data? Common approaches were: ● Labeling ● Tagging Data Review
  • 7. Labeling and Tagging Feature Value Frame Color Green Tire Color Brown Has Rear Rack Yes Has Fenders Yes Has Safety Bell No Has Fat Tires Yes Feature Value Frame Color Matte Olive Tire Color Orange Has Rear Rack Yes Has Fenders Yes Has Safety Bell Yes Has Fat Tires Yes Data Review
  • 8. Labeling and Tagging Feature Value Easy Assembly ⭐⭐⭐⭐⭐ Chain Quality ⭐⭐⭐ Seat Comfort ⭐ Gear Smoothness ⭐⭐⭐⭐ Data Review
  • 9. How to deal with Unstructured Data? Labeling and Tagging are labor intensive, subjective and error-prone What’s the new approach? Data Review
  • 11. Vector Embeddings What is a Vector? Numeric representation of something in N-dimensional space using floating numbers Can represent anything entire documents, images, video, audio…
  • 12. Vector Embeddings How to turn Data into Vectors? It’s quite a complex process, based primarily on Neural Networks
  • 13. Vector Embeddings How to turn Data into Vectors? Don’t be scared, Machine Learning and Deep Learning has leaped forward in the last decade and we all can benefit from a huge ecosystem of Models, ready to use! Each Model has its own specific task!
  • 14. Vector Embeddings Music Video Images Faces Poses Emotions Audio Model Video Model Vision Model Face Detection/Recognition Models Vision Model Trained on Poses Sentiment Model Embeddings
  • 15. Models quantifies features of the item Vector Embeddings Why vectors embeddings? They are comparable!
  • 16. Visual representation Vector Embeddings Semantic Relationship Syntactic Relationship
  • 17. Visual representation Vector Embeddings https://jalammar.github.io/illustrated-word2vec “King” [ 0.50451 , 0.68607 , -0.59517 , -0.022801, 0.60046 , -0.13498 , -0.08813 , 0.47377 , -0.61798 , -0.31012 , -0.076666, 1.493 , -0.034189, -0.98173 , 0.68229 , 0.81722 , -0.51874 , -0.31503 , -0.55809 , 0.66421 , 0.1961 , -0.13495 , -0.11476 , -0.30344 , 0.41177 , -2.223 , -1.0756 , -1.0783 , -0.34354 , 0.33505 , 1.9927 , -0.04234 , -0.64319 , 0.71125 , 0.49159 , 0.16754 , 0.34344 , -0.25663 , -0.8523 , 0.1661 , 0.40102 , 1.1685 , -1.0137 , -0.21585 , -0.15155 , 0.78321 , -0.91241 , -1.6106 , -0.64426 , -0.51042 ]
  • 23. So, is it all about arithmetic operations? Vector Embeddings What else? There is one main operation that you can do, and it’s called Similarity Search!
  • 24. Vector Similarity Search Algorithms Vector Embeddings
  • 26. Now that we have Vector Embeddings? Vector Embeddings We need a database to store them! Nope, we need a Vector Database!
  • 28. Vector Database Music Video Images Faces Poses Emotions Audio Model Video Model Vision Model Face Detection/Recognition Models Vision Model Trained on Poses Sentiment Model Embeddings REDIS
  • 29. How does a Vector DB need to have? ❏ Store data ❏ Index data ❏ Query data Does Redis have all of’em? Avoja, and much more! Vector Database
  • 30. Vector indexing algorithms Redis manages vectors in an index data structure to enable intelligent similarity search that balances search speed and search quality. Choose from two popular techniques, FLAT (a brute force approach) and HNSW (Hierarchical Navigable Small World - a faster, and approximate approach). Vector search distance metrics Redis uses a distance metric to measure the similarity between two vectors. Choose from three popular metrics – Euclidean, Inner Product, and Cosine Similarity – used to calculate how “close” or “far apart” two vectors are. Powerful hybrid filtering Take advantage of the full suite of search features available in Redis query and search. Enhance your workflows by combining the power of vector similarity with more traditional geo, numeric, text, and tag filters. Incorporate more business logic into queries and simplify client application code. Redis as Vector DB Vector Database
  • 31. Redis as Vector DB Real-time updates Real-time search and recommendation systems generate large volumes of changing data. New images, text, products, or metadata? Perform updates, insertions, and deletes to the search index seamlessly as your dataset changes overtime. Redis Enterprise reduces costly impacts of stagnant data. Vector range queries Traditional vector search is performed by finding the “top K” most similar vectors. Redis Enterprise also enables the discovery of relevant content within a predefined similarity range or threshold for an alternative, and offers a more flexible search experience. Vector Database
  • 32. Titolo Let’s see some code 4 of 4
  • 33. Demo - Plan B! spring.data.redis.host =35.187.74.111 spring.data.redis.port =12000 spring.data.redis.username =default spring.data.redis.password =redis server.port=8080 spring.mvc.hiddenmethod.filter.enabled =true com.redis.om.vss.useLocalImages =false com.redis.om.vss.maxLines =300 redis.om.spring.djl.enabled =true redis.om.spring.djl.image-embedding-model-engine =PyTorch redis.om.spring.djl.image-embedding-model-model-urls =djl://ai.djl.pytorch/resnet18_embedding redis.om.spring.djl.sentence-tokenizer-max-length =768 redis.om.spring.djl.sentence-tokenizer-model =sentence-transformers/all-mpnet-base-v2 redis.om.spring.djl.sentence-tokenizer-model-max-length =768 redis.om.spring.djl.face-detection-model-engine =PyTorch redis.om.spring.djl.face-detection-model-name =retinaface redis.om.spring.djl.face-detection-model-model-urls =https://resources.djl.ai/test-models/pytorch/retinaface.zip redis.om.spring.djl.face-embedding-model-engine =PyTorch redis.om.spring.djl.face-embedding-model-name =face_feature redis.om.spring.djl.face-embedding-model-model-urls =https://resources.djl.ai/test-models/pytorch/face_feature.zip
  • 34. Demo - Plan B! @Document public class ImageData { @Id private String id; @Indexed private String name; @Indexed private int height; @Indexed private int width; @Indexed(schemaFieldType = SchemaFieldType.VECTOR, algorithm = VectorField.VectorAlgorithm.HNSW, type = VectorType.FLOAT32, dimension = 512, distanceMetric = DistanceMetric.L2, initialCapacity = 10) private float[] imageEmbedding ; @Vectorize(destination = "imageEmbedding", embeddingType = EmbeddingType.FACE) private String imagePath; @Indexed private double score = 0; ... }
  • 35. Demo - Plan B! @Service public class BestOfMatchService { @Autowired private EntityStream entityStream; @Autowired public ZooModel <Image, float[]> faceEmbeddingModel ; private List<ImageData > matchAll (byte[] image, int limit) { List<ImageData > imageDataList = new ArrayList<>(); try (Predictor <Image, float[]> predictor = faceEmbeddingModel .newPredictor()) { ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream( image); Image img = ImageFactory .getInstance().fromInputStream( byteArrayInputStream ); float[] embedding = predictor .predict( img); byte[] embeddingAsByteArray = floatArrayToByteArray(embedding ); SearchStream<ImageData> stream = entityStream.of(ImageData.class); List<Pair<ImageData,Double>> matchWithScore = stream .filter(ImageData$.IMAGE_EMBEDDING.knn(K, embeddingAsByteArray)) .sorted(ImageData$._IMAGE_EMBEDDING_SCORE, SortedField.SortOrder.ASC) .limit(limit) .map(Fields.of(ImageData$._THIS, ImageData$._IMAGE_EMBEDDING_SCORE)) .collect(Collectors.toList()); for (Pair<ImageData ,Double> pair : matchWithScore ) { ImageData imageData = pair.getFirst(); Double score = pair.getSecond(); imageData .setScore( score); imageDataList .add(imageData ); } return imageDataList ; } catch (Exception e ) { throw new RuntimeException( e); } } }
  • 41. 4 Wrap up Unlocking the Future of Data: Powering Next-Gen AI with Vector Databases #WMF2024 3 2 1 Data Vector Embeddings Vector Database Redis
  • 42. VOTA L’INTERVENTO SU IBRIDA Luigi Fugaro Senior Solution Architect @ Redis
  • 43. TITOLO PASSAGGIO UNO Per ulteriori informazioni puoi scriverci a speaker@wemakefuture.it www.wemakefuture.it