Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

The Rise of Vector Data

Download to read offline

Modern Machine Learning (ML) represents everything as vectors, from documents, to videos, to user behavior. This representation makes it possible to accurately search, retrieve, rank, and classify different items by similarity and relevance.

Running real-time applications that rely on large numbers of such high dimensional vectors requires a new kind of data infrastructure. In this talk we will discuss the need for such infrastructure, the algorithmic and engineering challenges in working with vector data at scale, and open problems we still have no adequate solutions for.

Time permitting, I will introduce Pinecone as a solution to some of these challenges.

  • Be the first to like this

The Rise of Vector Data

  1. 1. The Rise of Vector Data Edo Liberty Founder & CEO, Pinecone
  2. 2. What is vector data?
  3. 3. What is vector data? Translation, understanding, Sentiment, Question Answering, Semantic Search, ... Anomaly detection, speech-to-text, music transcription, machinery malfunction, ... Object recognition, deduplication, scene detection, product search, ... Object Vector Task
  4. 4. Text: BERT, DistilBERT, word2vec, GloVe, ... Audio: wav2vec, mxnet-audio, ... Vision: resnet, alexnet, vgg, squeezenet, densenet, inception, googlenet, mobilenet, ... >> import torchvision.models as models >> model = models.squeezenet1_0(pretrained=True)
  5. 5. What if we save the vectors?
  6. 6. Then we can search by similarity
  7. 7. You’ve seen the results
  8. 8. Vectors need a new kind of database Key-Value Graph Vector Document
  9. 9. A vector index needs complex algorithms
  10. 10. A vector index needs complex algorithms http://ann-benchmarks.com/
  11. 11. A vector index also needs complex infrastructure Functionality + Scale ● Sharding ● Replication ● Live Updates ● Namespacing ● Filtering ● Pre/Post processing Production readiness ● High Availability ● Persistence ● Consistency ● Monitoring ● Alerting ● Support
  12. 12. You can leverage vectors through a managed service Vectors Similarity search as a service (Pinecone.io) Application or notebook
  13. 13. Image Search Demo
  14. 14. Thank you! Pinecone.io — Similarity search as a service

Modern Machine Learning (ML) represents everything as vectors, from documents, to videos, to user behavior. This representation makes it possible to accurately search, retrieve, rank, and classify different items by similarity and relevance. Running real-time applications that rely on large numbers of such high dimensional vectors requires a new kind of data infrastructure. In this talk we will discuss the need for such infrastructure, the algorithmic and engineering challenges in working with vector data at scale, and open problems we still have no adequate solutions for. Time permitting, I will introduce Pinecone as a solution to some of these challenges.

Views

Total views

84

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

8

Shares

0

Comments

0

Likes

0

×