SlideShare a Scribd company logo
Embedding-based Retrieval
in Search Ranking System
Marsan Ma
2020-10-21
Outlines
1. Retrieval / Ranking overview
2. Embedded Based Retrieval
○ Training
■ Triple loss architecture
■ Hard-negative mining
○ Serving
■ ANN (Approximate Nearest Neighborhood)
■ Product quantization
3. (Maybe) next generation search ranking system
1. Retrieval / Ranking
Now & Then
● Full pipeline composed of multiple stages
○ Matching / Pre-ranking: focus on reducing search space for later stages by dropping irrelevant
samples while guarantee high recall (ensure all positive samples included)
○ Ranking: focus on high precision, guarantee top-K align with user interest.
○ Reranking: overwrite model result for customized business purpose, like promoting
new/high-quality content, assisting known model weakness … etc.
Main concept
● Matching: Recall@K, QPS (query per second), RP (response time)
● Ranking: NDCG@K, MAP@K
● Reranking: your business objects.
(AUC is only for quick evaluation, not directly align with business value)
Evaluation according to purpose
● Over-simplified here, missing CF(collaborative filtering) family, factorize family, GCN family…etc.
● Two tower is an architecture prioritizing engineering effort (speed, cost, complexity).
○ Avoid cross-attention
○ ANN (Approximate nearest neighbor) on inference, without model.
● COLD: wanna include cross-attention, how to balance performance & computing cost.
Retrieval stage evolution
Current mainstream
precisecheap
● Key idea of two-tower: train embedding, but avoid cross attention
○ Query and candidate encoded independently all the way to the last layer (which calculating similarity).
○ Because using the whole model to do online inference is super expensive. We want to use the final embedding
only in online inference (forget about the model after training).
○ Cost: without cross-attention means sacrificing performance.
● Btw, JEM team deployed BERT+cross_attention since their data volume is small.
● Alibaba COLD tried to cost down while using cross attention.
○ Benchmark two-tower, COLD, and their “deep interest network”.
Why avoiding cross-attention?
Why Embedding?
From “syntactic matching” to “semantic matching”.
1. Factorize features
○ Everything shouldn’t be naively one-hot coded as black-or-white, they have implicit
relationships in high dimensional space. (concept of Factorize Machine)
○ For linear model, even by adding quadratic features like <xi*yi>, if <xi*yi> rarely or
never happen in training set, linear model cannot learn their relationship <Wx,Wy>.
FM or embedding could still learn through indirect relationship like x-z-y.
2. Fuzzy text match
○ Match between query "kacis creations" and “Kasie’s creations” while the term-based
match cannot.
3. Personalization
○ User embedding naturally enabled personalized matching result.
Why deep learning?
(This slide quote from linkedin kdd2020 presentation)
Retrieval/Ranking/Recommendation full picture
Embedding
Embedding
in short:
1. Compile knowledge into
embeddings
2. Find nearest neighbors
3. Find-grain sort
in final ranking
2. Embedding based
Retrieval (Training)
● A good review on main concept (good gif made by Google)
○ Two-tower, queries and database items are mapped to the embedding space.
○ model responds to natural-language queries.
Review: main concept of using embedding
1. “Unified” Embedding (by Facebook EBR)
○ two sided model: one side is search request (character n-gram), the other side is the document.
○ Other (social/location) features are included into encoder input! (thus called “unified”)
2. Triplet Loss function: keep enlarging the distance
between positive query-doc pairs with negative ones.
○ Have m terms (need) to tune, good and bad.
○ (?) Why not just use “clicked” as positive,
“seen but not clicked” as negative?
○ (?) Slower to converge
Best practice by today (2020)
Data sampling is crucial
1. Choosing “clicked” and “seen” as positive sample are good as each other in online test.
○ Since seen is also chosen by ranking stage, and it’s fine to choose the same in retrieval stage.
2. Hard negative mining
○ Online: choose K doc from other positive query-doc paris as hard negative. (K=2 the best)
○ Offline: choose top 101~500 from historical SERP as hard negative.
3. Random negative (seen) is better than hard-negative only (seen but not click)!!!
○ Hypothesis: model focused too much on hard-negative lose the ability to deal with obvious ones.
(ex: all hard-negatives are same location job with anchor job, so model thought location is not
important, which is obviously wrong.)
○ Also, random sample distribution align with serving distribution.
○ Best practice (two styles):
■ random negative/hard-negatives = 100:1
■ Transfer learning: train on hard first, than on random negatives.
Hard Negative Mining
1. Facebook in “Embedding Based Retrieval in Facebook Search”
○ Online hard negative
■ choose K doc from other positive query-doc paris as hard negative. (K=2 the best)
○ Offline hard negative
■ choose top 101~500 from historical SERP as hard negative.
2. AirBnB in “Real-time Personalization using Embeddings for Search Ranking at Airbnb”
○ Random sample items in same location with positive samples as hard-negative.
○ Add “rejected by room owner” as hard negative sample.
Embedding everywhere
1. Query converted to embedding
2. Indexed document with embedding
3. Retrieval stage use embedding,
also pass embedding to ranking model
to ensure ranking align with retrieval
(avoiding Matthew effect)
Engineering question
1. How often embedding re-trained/updated?
2. Detail about embedding based indexing?
Facebook search ranking system
Other topics
● Matthew Effect
○ Current ranking stages are designed for existing retrieval scenarios
=> ranker won’t agree with new retrieval algorithm, it reject (no impression) or give
them poor position (hard to be seen).
○ Solution: ranking model use retrieval stage embeddings as features, so ranking
model could learn from new insight. (by Facebook: empirically just add the cosine
similarity of query-item as ranking model feature)
● Embedding ensemble: weighted concatenation
○ Cascade multiple embeddings trained for different purpose
(each embedding will focus on one specific purpose, just like multi-channel retrieval.)
○ Alibaba COLD spend efforts on choosing best embeddings
2. Embedding based
Retrieval (Serving)
● Two-tower encoded embeddings independently, thus inference (serving) stage
no longer need the model.
● The only task in serving: find “top-K nearest neighborhood”
● Brute force way take O(N^2), how to reduce this?
Serving: main challenge
Serving: ANN (Approximate nearest neighborhood)
1. Tree based
○ KD-tree : good for low dimensional embeddings, but in high dimensional it’s good as brute-force.
○ For high-dimension, use hash-based or vector quantization (following two categories)
2. Hash based
○ LSH (Locality-Sensitive Hashing)
○ For < 10 million data volume, this category is good.
○ Open source like FALCONN, Annoy (by Spotify), NMSLIB (AWS Elastic Search, best@2019).
3. Vector quantization
○ Main stream for hundred million level data. Product quantization is the best practice. (deep-dive
in following slides)
○ Open source like FAISS (by Facebook), ScaNN (by Google)
4. Others
○ Milvus: open source vector similarity search engine, use it as a database query.
○ NGT (by Yahoo): best in some benchmarks
○ NSG (by Alibaba-Taobao)
All giants develop their own embedding+ANN, serve faster without losing precision is the key!
● Benchmark reference : (1 / 2 / 3)
○ NMSLib is the best among hash based algorithm.
○ FAISS speed up with GPU, and ScaNN further improved performance (recall@10).
(All new algorithms claim better against NMSlib, so maybe at this moment (2020) NMSlib is still the most stable choice
if you don’t trust new thing.)
Serving: ANN (Approximate nearest neighborhood)
FAISS (vector quantization) is
much faster than nmslib with GPU
ScaNN (vector quantization) now
best in both performance & speed
Serving : Product Quantization
(Note that before product quantization, there is “coarse quantization” using K-means and choose cluster.)
1. Let’s say you have original 50k jobs, each represent in 1024 dim embeddings.
2. Break-down 1024 dim embedding vector into 8 x 128 dim chunks
3. Encode them into 8-bit (256) groups, each group represented by its center.
1. When calculating all 50K distance(query, item) pairs
○ Preparing a look-up table of 256 distance(query, center)
○ Thus, for each of 50K distance(query, item) = SUM(8 x distance(query, center_i))
2. Computing reduced thousand times:
○ from: 1K dim root-mean-square
○ To: SUM(8 look-up-table values)
3. Memory reduced ~500 times
○ 4096 bytes float => 8 bytes id
Serving : Product Quantization
Serving : Other techniques
● Coarse quantization (invert file index)
○ Cluster all items into groups, only choose top-K groups with center closest to query.
○ After coarse quantization, then do product quantization to choose final candidates.
● Residue encoding
○ After vectors grouped, use residue to replace the original embedding vectors to improve
resolution after quantized. (as if remove offset, centeralize vectors into origin, like left figure.)
○ Note that for different group, query vector will have different residue, since different center.
TL;DR on 3rd generation
(best practice @ 2020)
TL;DR
(If you can only remember two thing today, here it is):
1. In training stage,
find best way to compile all knowledge into
user/item embeddings.
2. In serving stage,
find the fastest/cheapest way to
find nearest neighbors.
3. (Maybe) next generation
Search Ranking System
Alibaba’s latest best practices: (both expensive and high engineering effort, just FYI)
● Retrieval: COLD (Cost aware, Online, Lightweight Deep pre-ranking)
○ Eager to the performance improve from cross-attention.
○ Feature selection to reduce computing cost (avoid assembling too many
embeddings)
○ In short, choose ensembled embeddings having best AUC while maintaining
acceptable QPS (query-per-seconds) and RT (return-time).
○ Also take many engineering efforts to speed-up & cost down.
● Ranking: DIEN (deep interest network)
○ Rather than synthesize user embedding by “latest K clicked items”, use attention to
query “latest K relevant clicked items”.
○ Demerit: user embedding have to be synthesized online.
“Maybe” next generation : Ali COLD+DIEN
Deep interest network (DIN)
● Basic model (left): trained upon user_vec, item_vec of latest K clicked items.
● DIN (right): current viewing item decide attention weight to latest K clicked items.
● Note everything (Goods/Shop/Category/Other) are embedding, rather than 1-hot-coding.
○ Everything is factorized, not just black-or-white (0/1).
An alternative future: user-interest-capsule
● Latest (since 2019) user-vector tech, MIND (multi interest network with dynamic routing) in Alibaba:
○ DIN attention to item? Why not also attention to multi-user interest? It’s naive to assume user as
single user-interest vector.
○ Indeed could skip this since job-seeker rarely have multiple career interest.
Appendix
● DeepFM v.s unified embedding
● Two tower or Siamese
● Character level n-gram
● Triplet loss
● Random negative + Hard negative mining (100:1)
● Residual encoding
● Embeddings weighted concatenation
● Multitask
Review tricks worth trying
Impact size : Facebook, Tencent
1. Facebook EBR
a. location feature and social embedding helps a lot! (Don’t forget domain specific data!)
2. Tencent ranking model (CTR)
a. Naive DNN: AUC=0.7618
b. Multi-task (CTR+Favorite+Like…) DNN: AUC=0.7678 (+0.6%)
c. DeepFM: AUC +0.2%
d. Last View + DIN: AUC +0.2%
e. Last Display + GRU (?): AUC + 0.4%
Trade off between performance (recall) & computing cost, strike balance between
vector-product based and fully DIN.
Impact size : Alibaba
Impact size : Linkedin
1. Embedding-based Retrieval in Facebook Search
○ 理解 product quantization 算法
○ 负样本为王:评Facebook的向量化召回算法
2. Pre-training Tasks for Embedding-based Large-scale Retrieval
○ 向量化召回也需要“预训练”
3. Product Quantizers for k-NN Tutorial
4. COLD: Towards the Next Generation of Pre-Ranking System
○ 阿里定向广告最新突破:面向下一代的粗排排序系 统COLD
5. Multi-Interest Network with Dynamic Routing for Recommendation at Tmall
○ 解读阿里深度学习实践,CTR 预估、MLR 模型、兴趣分布网络等
6. 推荐系统技术演进趋势:从召回到排序再到重排
7. 搜索推荐召回&&粗排相关性优化最新进展—2020
References

More Related Content

What's hot

Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
Appsilon Data Science
 
Kdd 2014 Tutorial - the recommender problem revisited
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisited
Xavier Amatriain
 
Using synthetic data for computer vision model training
Using synthetic data for computer vision model trainingUsing synthetic data for computer vision model training
Using synthetic data for computer vision model training
Unity Technologies
 
Paper description of "Reformer"
Paper description of "Reformer"Paper description of "Reformer"
Paper description of "Reformer"
GenkiYasumoto
 
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
Vitaly Bondar
 
Conditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN DecodersConditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN Decoders
suga93
 
Data profiling in Apache Calcite
Data profiling in Apache CalciteData profiling in Apache Calcite
Data profiling in Apache Calcite
DataWorks Summit
 
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIGenerative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
WithTheBest
 
オープンデータを入手してQGISで開いてみよう!
オープンデータを入手してQGISで開いてみよう!オープンデータを入手してQGISで開いてみよう!
オープンデータを入手してQGISで開いてみよう!
Masaharu Ohashi
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and Applications
Hoang Nguyen
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Preferred Networks
 
Generative adversarial text to image synthesis
Generative adversarial text to image synthesisGenerative adversarial text to image synthesis
Generative adversarial text to image synthesis
Universitat Politècnica de Catalunya
 
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDistributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Databricks
 
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.aiIntro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
Sri Ambati
 
[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)
[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)
[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)
Seongyun Byeon
 
GraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLGraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQL
Spark Summit
 
Machine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerMachine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerAmazon Web Services
 
Serving BERT Models in Production with TorchServe
Serving BERT Models in Production with TorchServeServing BERT Models in Production with TorchServe
Serving BERT Models in Production with TorchServe
Nidhin Pattaniyil
 
Feature pyramid networks for object detection
Feature pyramid networks for object detection Feature pyramid networks for object detection
Feature pyramid networks for object detection
heedaeKwon
 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inference
Amit Sharma
 

What's hot (20)

Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 
Kdd 2014 Tutorial - the recommender problem revisited
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisited
 
Using synthetic data for computer vision model training
Using synthetic data for computer vision model trainingUsing synthetic data for computer vision model training
Using synthetic data for computer vision model training
 
Paper description of "Reformer"
Paper description of "Reformer"Paper description of "Reformer"
Paper description of "Reformer"
 
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
 
Conditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN DecodersConditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN Decoders
 
Data profiling in Apache Calcite
Data profiling in Apache CalciteData profiling in Apache Calcite
Data profiling in Apache Calcite
 
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIGenerative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
 
オープンデータを入手してQGISで開いてみよう!
オープンデータを入手してQGISで開いてみよう!オープンデータを入手してQGISで開いてみよう!
オープンデータを入手してQGISで開いてみよう!
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and Applications
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
 
Generative adversarial text to image synthesis
Generative adversarial text to image synthesisGenerative adversarial text to image synthesis
Generative adversarial text to image synthesis
 
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDistributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
 
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.aiIntro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
 
[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)
[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)
[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)
 
GraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLGraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQL
 
Machine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerMachine Learning & Amazon SageMaker
Machine Learning & Amazon SageMaker
 
Serving BERT Models in Production with TorchServe
Serving BERT Models in Production with TorchServeServing BERT Models in Production with TorchServe
Serving BERT Models in Production with TorchServe
 
Feature pyramid networks for object detection
Feature pyramid networks for object detection Feature pyramid networks for object detection
Feature pyramid networks for object detection
 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inference
 

Similar to Embedded based retrieval in modern search ranking system

Supervised embedding techniques in search ranking system
Supervised embedding techniques in search ranking systemSupervised embedding techniques in search ranking system
Supervised embedding techniques in search ranking system
Marsan Ma
 
Universal job embedding in recommendation (public ver.)
Universal job embedding in recommendation (public ver.)Universal job embedding in recommendation (public ver.)
Universal job embedding in recommendation (public ver.)
Marsan Ma
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Xavier Amatriain
 
Buidling large scale recommendation engine
Buidling large scale recommendation engineBuidling large scale recommendation engine
Buidling large scale recommendation engine
Keeyong Han
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
Daniel Marcous
 
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
MLconf
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems
Xavier Amatriain
 
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf
Xavier Amatriain
 
NTU DBME5028 Week8 Transfer Learning
NTU DBME5028 Week8 Transfer LearningNTU DBME5028 Week8 Transfer Learning
NTU DBME5028 Week8 Transfer Learning
Sean Yu
 
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsBIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
Xavier Amatriain
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In Industry
Xavier Amatriain
 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB
 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
Lisa Roth, PMP
 
Search@flipkart
Search@flipkartSearch@flipkart
Search@flipkart
Umesh Prasad
 
Kaggle Days Milan - March 2019
Kaggle Days Milan - March 2019Kaggle Days Milan - March 2019
Kaggle Days Milan - March 2019
Alberto Danese
 
Unsupervised Learning
Unsupervised LearningUnsupervised Learning
Unsupervised Learning
SAHEEL FAL DESAI
 
Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies
Dori Waldman
 
Machine learning4dummies
Machine learning4dummiesMachine learning4dummies
Machine learning4dummies
Michael Winer
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
Parmeshwar Khurd
 
PyData SF 2016 --- Moving forward through the darkness
PyData SF 2016 --- Moving forward through the darknessPyData SF 2016 --- Moving forward through the darkness
PyData SF 2016 --- Moving forward through the darkness
Chia-Chi Chang
 

Similar to Embedded based retrieval in modern search ranking system (20)

Supervised embedding techniques in search ranking system
Supervised embedding techniques in search ranking systemSupervised embedding techniques in search ranking system
Supervised embedding techniques in search ranking system
 
Universal job embedding in recommendation (public ver.)
Universal job embedding in recommendation (public ver.)Universal job embedding in recommendation (public ver.)
Universal job embedding in recommendation (public ver.)
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
Buidling large scale recommendation engine
Buidling large scale recommendation engineBuidling large scale recommendation engine
Buidling large scale recommendation engine
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
 
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems
 
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf
 
NTU DBME5028 Week8 Transfer Learning
NTU DBME5028 Week8 Transfer LearningNTU DBME5028 Week8 Transfer Learning
NTU DBME5028 Week8 Transfer Learning
 
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsBIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In Industry
 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
 
Search@flipkart
Search@flipkartSearch@flipkart
Search@flipkart
 
Kaggle Days Milan - March 2019
Kaggle Days Milan - March 2019Kaggle Days Milan - March 2019
Kaggle Days Milan - March 2019
 
Unsupervised Learning
Unsupervised LearningUnsupervised Learning
Unsupervised Learning
 
Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies
 
Machine learning4dummies
Machine learning4dummiesMachine learning4dummies
Machine learning4dummies
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
 
PyData SF 2016 --- Moving forward through the darkness
PyData SF 2016 --- Moving forward through the darknessPyData SF 2016 --- Moving forward through the darkness
PyData SF 2016 --- Moving forward through the darkness
 

Recently uploaded

History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
laozhuseo02
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
eutxy
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
natyesu
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
ufdana
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
Javier Lasa
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
3ipehhoa
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
JungkooksNonexistent
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Brad Spiegel Macon GA
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
keoku
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Sanjeev Rampal
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
Rogerio Filho
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
3ipehhoa
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
Arif0071
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
laozhuseo02
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
nirahealhty
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
JeyaPerumal1
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
3ipehhoa
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
Gal Baras
 
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptxInternet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
VivekSinghShekhawat2
 

Recently uploaded (20)

History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
 
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptxInternet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
 

Embedded based retrieval in modern search ranking system

  • 1. Embedding-based Retrieval in Search Ranking System Marsan Ma 2020-10-21
  • 2. Outlines 1. Retrieval / Ranking overview 2. Embedded Based Retrieval ○ Training ■ Triple loss architecture ■ Hard-negative mining ○ Serving ■ ANN (Approximate Nearest Neighborhood) ■ Product quantization 3. (Maybe) next generation search ranking system
  • 3. 1. Retrieval / Ranking Now & Then
  • 4. ● Full pipeline composed of multiple stages ○ Matching / Pre-ranking: focus on reducing search space for later stages by dropping irrelevant samples while guarantee high recall (ensure all positive samples included) ○ Ranking: focus on high precision, guarantee top-K align with user interest. ○ Reranking: overwrite model result for customized business purpose, like promoting new/high-quality content, assisting known model weakness … etc. Main concept
  • 5. ● Matching: Recall@K, QPS (query per second), RP (response time) ● Ranking: NDCG@K, MAP@K ● Reranking: your business objects. (AUC is only for quick evaluation, not directly align with business value) Evaluation according to purpose
  • 6. ● Over-simplified here, missing CF(collaborative filtering) family, factorize family, GCN family…etc. ● Two tower is an architecture prioritizing engineering effort (speed, cost, complexity). ○ Avoid cross-attention ○ ANN (Approximate nearest neighbor) on inference, without model. ● COLD: wanna include cross-attention, how to balance performance & computing cost. Retrieval stage evolution Current mainstream precisecheap
  • 7. ● Key idea of two-tower: train embedding, but avoid cross attention ○ Query and candidate encoded independently all the way to the last layer (which calculating similarity). ○ Because using the whole model to do online inference is super expensive. We want to use the final embedding only in online inference (forget about the model after training). ○ Cost: without cross-attention means sacrificing performance. ● Btw, JEM team deployed BERT+cross_attention since their data volume is small. ● Alibaba COLD tried to cost down while using cross attention. ○ Benchmark two-tower, COLD, and their “deep interest network”. Why avoiding cross-attention?
  • 8. Why Embedding? From “syntactic matching” to “semantic matching”. 1. Factorize features ○ Everything shouldn’t be naively one-hot coded as black-or-white, they have implicit relationships in high dimensional space. (concept of Factorize Machine) ○ For linear model, even by adding quadratic features like <xi*yi>, if <xi*yi> rarely or never happen in training set, linear model cannot learn their relationship <Wx,Wy>. FM or embedding could still learn through indirect relationship like x-z-y. 2. Fuzzy text match ○ Match between query "kacis creations" and “Kasie’s creations” while the term-based match cannot. 3. Personalization ○ User embedding naturally enabled personalized matching result.
  • 9. Why deep learning? (This slide quote from linkedin kdd2020 presentation)
  • 10. Retrieval/Ranking/Recommendation full picture Embedding Embedding in short: 1. Compile knowledge into embeddings 2. Find nearest neighbors 3. Find-grain sort in final ranking
  • 12. ● A good review on main concept (good gif made by Google) ○ Two-tower, queries and database items are mapped to the embedding space. ○ model responds to natural-language queries. Review: main concept of using embedding
  • 13. 1. “Unified” Embedding (by Facebook EBR) ○ two sided model: one side is search request (character n-gram), the other side is the document. ○ Other (social/location) features are included into encoder input! (thus called “unified”) 2. Triplet Loss function: keep enlarging the distance between positive query-doc pairs with negative ones. ○ Have m terms (need) to tune, good and bad. ○ (?) Why not just use “clicked” as positive, “seen but not clicked” as negative? ○ (?) Slower to converge Best practice by today (2020)
  • 14. Data sampling is crucial 1. Choosing “clicked” and “seen” as positive sample are good as each other in online test. ○ Since seen is also chosen by ranking stage, and it’s fine to choose the same in retrieval stage. 2. Hard negative mining ○ Online: choose K doc from other positive query-doc paris as hard negative. (K=2 the best) ○ Offline: choose top 101~500 from historical SERP as hard negative. 3. Random negative (seen) is better than hard-negative only (seen but not click)!!! ○ Hypothesis: model focused too much on hard-negative lose the ability to deal with obvious ones. (ex: all hard-negatives are same location job with anchor job, so model thought location is not important, which is obviously wrong.) ○ Also, random sample distribution align with serving distribution. ○ Best practice (two styles): ■ random negative/hard-negatives = 100:1 ■ Transfer learning: train on hard first, than on random negatives.
  • 15. Hard Negative Mining 1. Facebook in “Embedding Based Retrieval in Facebook Search” ○ Online hard negative ■ choose K doc from other positive query-doc paris as hard negative. (K=2 the best) ○ Offline hard negative ■ choose top 101~500 from historical SERP as hard negative. 2. AirBnB in “Real-time Personalization using Embeddings for Search Ranking at Airbnb” ○ Random sample items in same location with positive samples as hard-negative. ○ Add “rejected by room owner” as hard negative sample.
  • 16. Embedding everywhere 1. Query converted to embedding 2. Indexed document with embedding 3. Retrieval stage use embedding, also pass embedding to ranking model to ensure ranking align with retrieval (avoiding Matthew effect) Engineering question 1. How often embedding re-trained/updated? 2. Detail about embedding based indexing? Facebook search ranking system
  • 17. Other topics ● Matthew Effect ○ Current ranking stages are designed for existing retrieval scenarios => ranker won’t agree with new retrieval algorithm, it reject (no impression) or give them poor position (hard to be seen). ○ Solution: ranking model use retrieval stage embeddings as features, so ranking model could learn from new insight. (by Facebook: empirically just add the cosine similarity of query-item as ranking model feature) ● Embedding ensemble: weighted concatenation ○ Cascade multiple embeddings trained for different purpose (each embedding will focus on one specific purpose, just like multi-channel retrieval.) ○ Alibaba COLD spend efforts on choosing best embeddings
  • 19. ● Two-tower encoded embeddings independently, thus inference (serving) stage no longer need the model. ● The only task in serving: find “top-K nearest neighborhood” ● Brute force way take O(N^2), how to reduce this? Serving: main challenge
  • 20. Serving: ANN (Approximate nearest neighborhood) 1. Tree based ○ KD-tree : good for low dimensional embeddings, but in high dimensional it’s good as brute-force. ○ For high-dimension, use hash-based or vector quantization (following two categories) 2. Hash based ○ LSH (Locality-Sensitive Hashing) ○ For < 10 million data volume, this category is good. ○ Open source like FALCONN, Annoy (by Spotify), NMSLIB (AWS Elastic Search, best@2019). 3. Vector quantization ○ Main stream for hundred million level data. Product quantization is the best practice. (deep-dive in following slides) ○ Open source like FAISS (by Facebook), ScaNN (by Google) 4. Others ○ Milvus: open source vector similarity search engine, use it as a database query. ○ NGT (by Yahoo): best in some benchmarks ○ NSG (by Alibaba-Taobao) All giants develop their own embedding+ANN, serve faster without losing precision is the key!
  • 21. ● Benchmark reference : (1 / 2 / 3) ○ NMSLib is the best among hash based algorithm. ○ FAISS speed up with GPU, and ScaNN further improved performance (recall@10). (All new algorithms claim better against NMSlib, so maybe at this moment (2020) NMSlib is still the most stable choice if you don’t trust new thing.) Serving: ANN (Approximate nearest neighborhood) FAISS (vector quantization) is much faster than nmslib with GPU ScaNN (vector quantization) now best in both performance & speed
  • 22. Serving : Product Quantization (Note that before product quantization, there is “coarse quantization” using K-means and choose cluster.) 1. Let’s say you have original 50k jobs, each represent in 1024 dim embeddings. 2. Break-down 1024 dim embedding vector into 8 x 128 dim chunks 3. Encode them into 8-bit (256) groups, each group represented by its center.
  • 23. 1. When calculating all 50K distance(query, item) pairs ○ Preparing a look-up table of 256 distance(query, center) ○ Thus, for each of 50K distance(query, item) = SUM(8 x distance(query, center_i)) 2. Computing reduced thousand times: ○ from: 1K dim root-mean-square ○ To: SUM(8 look-up-table values) 3. Memory reduced ~500 times ○ 4096 bytes float => 8 bytes id Serving : Product Quantization
  • 24. Serving : Other techniques ● Coarse quantization (invert file index) ○ Cluster all items into groups, only choose top-K groups with center closest to query. ○ After coarse quantization, then do product quantization to choose final candidates. ● Residue encoding ○ After vectors grouped, use residue to replace the original embedding vectors to improve resolution after quantized. (as if remove offset, centeralize vectors into origin, like left figure.) ○ Note that for different group, query vector will have different residue, since different center.
  • 25. TL;DR on 3rd generation (best practice @ 2020)
  • 26. TL;DR (If you can only remember two thing today, here it is): 1. In training stage, find best way to compile all knowledge into user/item embeddings. 2. In serving stage, find the fastest/cheapest way to find nearest neighbors.
  • 27. 3. (Maybe) next generation Search Ranking System
  • 28. Alibaba’s latest best practices: (both expensive and high engineering effort, just FYI) ● Retrieval: COLD (Cost aware, Online, Lightweight Deep pre-ranking) ○ Eager to the performance improve from cross-attention. ○ Feature selection to reduce computing cost (avoid assembling too many embeddings) ○ In short, choose ensembled embeddings having best AUC while maintaining acceptable QPS (query-per-seconds) and RT (return-time). ○ Also take many engineering efforts to speed-up & cost down. ● Ranking: DIEN (deep interest network) ○ Rather than synthesize user embedding by “latest K clicked items”, use attention to query “latest K relevant clicked items”. ○ Demerit: user embedding have to be synthesized online. “Maybe” next generation : Ali COLD+DIEN
  • 29. Deep interest network (DIN) ● Basic model (left): trained upon user_vec, item_vec of latest K clicked items. ● DIN (right): current viewing item decide attention weight to latest K clicked items. ● Note everything (Goods/Shop/Category/Other) are embedding, rather than 1-hot-coding. ○ Everything is factorized, not just black-or-white (0/1).
  • 30. An alternative future: user-interest-capsule ● Latest (since 2019) user-vector tech, MIND (multi interest network with dynamic routing) in Alibaba: ○ DIN attention to item? Why not also attention to multi-user interest? It’s naive to assume user as single user-interest vector. ○ Indeed could skip this since job-seeker rarely have multiple career interest.
  • 32. ● DeepFM v.s unified embedding ● Two tower or Siamese ● Character level n-gram ● Triplet loss ● Random negative + Hard negative mining (100:1) ● Residual encoding ● Embeddings weighted concatenation ● Multitask Review tricks worth trying
  • 33. Impact size : Facebook, Tencent 1. Facebook EBR a. location feature and social embedding helps a lot! (Don’t forget domain specific data!) 2. Tencent ranking model (CTR) a. Naive DNN: AUC=0.7618 b. Multi-task (CTR+Favorite+Like…) DNN: AUC=0.7678 (+0.6%) c. DeepFM: AUC +0.2% d. Last View + DIN: AUC +0.2% e. Last Display + GRU (?): AUC + 0.4%
  • 34. Trade off between performance (recall) & computing cost, strike balance between vector-product based and fully DIN. Impact size : Alibaba
  • 35. Impact size : Linkedin
  • 36. 1. Embedding-based Retrieval in Facebook Search ○ 理解 product quantization 算法 ○ 负样本为王:评Facebook的向量化召回算法 2. Pre-training Tasks for Embedding-based Large-scale Retrieval ○ 向量化召回也需要“预训练” 3. Product Quantizers for k-NN Tutorial 4. COLD: Towards the Next Generation of Pre-Ranking System ○ 阿里定向广告最新突破:面向下一代的粗排排序系 统COLD 5. Multi-Interest Network with Dynamic Routing for Recommendation at Tmall ○ 解读阿里深度学习实践,CTR 预估、MLR 模型、兴趣分布网络等 6. 推荐系统技术演进趋势:从召回到排序再到重排 7. 搜索推荐召回&&粗排相关性优化最新进展—2020 References