Walking Around Your Nearest Neighbors with Lambda Architecture

•

0 likes•312 views

Embeddings are a great way to represent information, but in production is still not a commodity. But you can still find ways to use them with lambda architecture.

Data & Analytics

Haystack / MICES / Berlin Buzzwords
June 10, 2020
WalkingAroundYourNearest
Neighbors
with lambda architecture
Elias Nema

Dense vectors (embeddings) are the new oil
But sometimes refinement costs are too high
Because dense vectors are dense
Storing them and searching through them might put an
excessive load on your search engine
Haiku
Walkingaroundyournearestneighbors

Content is generated by users
Millions of items are added and removed daily
As soon as item is posted we want to produce recommendations for it
As well as recommend this item to the other items from the catalog
Usecase:Recommendations@OLXGroup
Online Marketplace
25+ countries
350M+ MAU

Constraints
• We have embeddings for our items. But,
• It’s not feasible to store these item embeddings in our search engine (many reasons).
• Though we still would like to use them for recommendations.
So, which options do we have?

Embeddingspace
Item 1
Item 2
Item 3
Item 4

Embeddingspace
Closest
2nd
3rd
2nd
3rd
4th

Flow
1st 2nd
1st: Neighbors
for the new item
2nd: Update
neighborhood
All items cache
New item has recommendations as
well as also being recommended to
the other items

Flow
Application to calculate
nearest neighbors
(faiss+sqlite)
Fast cache
Stream of
item updates
Daily batch job to
avoid drifting

Application to calculate
nearest neighbors
(faiss+sqlite)
Fast cache
Stream of
item updates
Daily batch job to
avoid drifting
Speed layer
Batch layer
Serving
layer
Lambdaarchitecture

Conclusion
•Embeddings are a great way to represent information, but in production are still not
a commodity.
•You can still ﬁnd ways to use them in production e.g. with lambda architecture.
•Of course, this approach has limited reranking capabilities.
• … or you can use Sphinx (avito.ru does a brute force vector ranking for up to 5M
documents in <10ms - sorry, in Russian).
Elias Nema
eliasnema
eliasnema.com

Similar to Walking Around Your Nearest Neighbors with Lambda Architecture

Elastic @ Adobe: Making Search Smarter with Machine Learning at ScaleElasticsearch

Serverless projects at MyplanetDaniel Zivkovic

Recommendation at scalesimondolle

Lec 10Hanakojang Kra Tay

Train and Deploy Machine Learning Workloads with AWS Container Services (July...Julien SIMON

HTML5: An IntroductionClearPivot

Japanese Startup Use-Cases and Tech Deep DiveEiji Shinohara

Doing Drupal Multi-site without codeel-studio.com

Berlin AWS User Group - 10 May 2022 Aaron Walker

WordPress on Amazon Web Services Meetup Kel

Serverless Toronto helps StartupsDaniel Zivkovic

Building CI from scratchArtem Nikitin

Building Clouds with Apache CloudStack - the business use-casesShapeBlue

AWS Lambda Containers - bridging the gap between serverless and containers on...Yun Zhi Lin

Boilerplates: Step up your Web Development ProcessFibonalabs

Ceph at Spreadshirt (June 2016)Jens Hadlich

Catalina Oyaneder | Ultimate Stack Compilation | Codemotion Madrid 2018 Codemotion

What’s the big deal with Graph Databases?Daniel Zivkovic

Building_a_Modern_Data_Platform_in_the_Cloud.pdfAmazon Web Services

Complete guide for shopify seoPearl Lemon

Similar to Walking Around Your Nearest Neighbors with Lambda Architecture (20)

Elastic @ Adobe: Making Search Smarter with Machine Learning at Scale

Serverless projects at Myplanet

Recommendation at scale

Lec 10

Train and Deploy Machine Learning Workloads with AWS Container Services (July...

HTML5: An Introduction

Japanese Startup Use-Cases and Tech Deep Dive

Doing Drupal Multi-site without code

Berlin AWS User Group - 10 May 2022

WordPress on Amazon Web Services Meetup

Serverless Toronto helps Startups

Building CI from scratch

Building Clouds with Apache CloudStack - the business use-cases

AWS Lambda Containers - bridging the gap between serverless and containers on...

Boilerplates: Step up your Web Development Process

Ceph at Spreadshirt (June 2016)

Catalina Oyaneder | Ultimate Stack Compilation | Codemotion Madrid 2018

What’s the big deal with Graph Databases?

Building_a_Modern_Data_Platform_in_the_Cloud.pdf

Complete guide for shopify seo

Recently uploaded

Brighton SEO | April 2024 | Data StorytellingNeil Barnes

Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics

Industrialised data - the key to AI success.pdfLars Albertsson

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408

Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863

Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha

From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck

RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375

E-Commerce Order PredictionShraddha Kamble.pptxBoston Institute of Analytics

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh9953056974 Low Rate Call Girls In Saket, Delhi NCR

B2 Creative Industry Response Evaluation.docxStephen266013

Decoding Loan Approval: Predictive Modeling in ActionBoston Institute of Analytics

Recently uploaded (20)

Brighton SEO | April 2024 | Data Storytelling

Predicting Employee Churn: A Data-Driven Approach Project Presentation

Industrialised data - the key to AI success.pdf

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps

Dubai Call Girls Wifey O52&786472 Call Girls Dubai

Call Girls In Mahipalpur O9654467111 Escorts Service

From idea to production in a day – Leveraging Azure ML and Streamlit to build...

RA-11058_IRR-COMPRESS Do 198 series of 1998

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...

E-Commerce Order PredictionShraddha Kamble.pptx

FESE Capital Markets Fact Sheet 2024 Q1.pdf

PKS-TGC-1084-630 - Stage 1 Proposal.pptx

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh

B2 Creative Industry Response Evaluation.docx

Decoding Loan Approval: Predictive Modeling in Action

Walking Around Your Nearest Neighbors with Lambda Architecture

1. Haystack / MICES / Berlin Buzzwords June 10, 2020 WalkingAroundYourNearest Neighbors with lambda architecture Elias Nema

2. Dense vectors (embeddings) are the new oil But sometimes refinement costs are too high Because dense vectors are dense Storing them and searching through them might put an excessive load on your search engine Haiku Walkingaroundyournearestneighbors

3. Content is generated by users Millions of items are added and removed daily As soon as item is posted we want to produce recommendations for it As well as recommend this item to the other items from the catalog Usecase:Recommendations@OLXGroup Online Marketplace 25+ countries 350M+ MAU

4. Constraints • We have embeddings for our items. But, • It’s not feasible to store these item embeddings in our search engine (many reasons). • Though we still would like to use them for recommendations. So, which options do we have?

5. Embeddingspace Item 1 Item 2 Item 3 Item 4

6. New item Embeddingspace

7. Hey neighbor! Embeddingspace

8. Embeddingspace Closest 2nd 3rd 4th

9. Embeddingspace Closest 2nd 3rd

10. Embeddingspace Closest 2nd 3rd 2nd 3rd 4th

11. Flow 1st 2nd 1st: Neighbors for the new item 2nd: Update neighborhood All items cache New item has recommendations as well as also being recommended to the other items

12. Flow Application to calculate nearest neighbors (faiss+sqlite) Fast cache Stream of item updates Daily batch job to avoid drifting

13. Application to calculate nearest neighbors (faiss+sqlite) Fast cache Stream of item updates Daily batch job to avoid drifting Speed layer Batch layer Serving layer Lambdaarchitecture

14. Conclusion •Embeddings are a great way to represent information, but in production are still not a commodity. •You can still ﬁnd ways to use them in production e.g. with lambda architecture. •Of course, this approach has limited reranking capabilities. • … or you can use Sphinx (avito.ru does a brute force vector ranking for up to 5M documents in <10ms - sorry, in Russian). Elias Nema eliasnema eliasnema.com

Walking Around Your Nearest Neighbors with Lambda Architecture

Recommended

Recommended

More Related Content

Similar to Walking Around Your Nearest Neighbors with Lambda Architecture

Similar to Walking Around Your Nearest Neighbors with Lambda Architecture (20)

Recently uploaded

Recently uploaded (20)

Walking Around Your Nearest Neighbors with Lambda Architecture