Frequently Bought Together Recommendations Based on Embeddings

Embedding Based Frequently Bought
Together Recommendations:
A Production Use Case

Agenda
Mehmet Selman Sezgin
Senior Data Engineer at Hepsiburada
Ulukbek Attokurov
Data Scientist at Hepsiburada

Content
● Embedding Based Recommendations
● Modeling (Frequently Bought Together)
● Arithmetic operations on Embeddings
● Architecture Overview
● Serving Layer
● Experimental UI
● Online Metrics
● Conclusion

▪ 40+ categories▪ 200M+ visitors
per month
▪ 30M+ products

https://developers.google.com/machine-learning/crash-course/embeddings
An embedding is a relatively low-
dimensional space into which you can
translate high-dimensional vectors.

Co-occurrence vs Embeddings
▪ Uses raw co-occurrence statistics
(Salton89 which is TF-IDF based
metric)
▪ Uses behavior data (product views,
order, add-to-cart)
▪ Generates item based
recommendation
▪ Can not project users and items into
the same space
▪ Uses advanced methods (Resnet,
Inception, VGG, Word2Vec, Doc2Vec,
AutoEncoders , BERT etc.)
▪ Generates item and user based
recommendations
▪ Uses content information as product
image, product description, product
name, product attributes
▪ Image, text, behavior embeddings
can be projected into the same space
EmbeddingsCo-occurrence statistics

Co-occurrence vs Embeddings
▪ Products are not recommended if
they do not appear in the same
context
▪ Context information such as
products appeared in the same
session, transaction etc. is not
employed
▪ Content information (image, text
etc.) is not used
▪ Similarity metrics can be calculated
▪ Use as features in unsupervised and
supervised methods to optimize a
business metric as propensity score.
▪ Use as features in Neural Networks
as LSTM to model behavior of
customers over time.
▪ Use as features in KNN to
recommend the most similar items
EmbeddingsCo-occurrence statistics

Frequently Bought Together
▪ Goal: building recommendations to offer complementary products to
our customers
▪ Challenges:
▪ Orders might contain products from diverse categories
▪ Generating recommendations using 30M+ products distributed over 40+ categories
▪ Tips: bought together does not mean that the items which co-occur in
the sequence are similar
▪ Our model choice: Word2Vec

Word2Vec
▪ Easy to use
▪ Easy to train
▪ Simple format of training samples
▪ User friendly libraries like Gensim
▪ A few parameters to optimize
▪ A lot of practical use cases

Data Preparation
▪ Sentence
▪ Bag-of-Words
▪ “I am attending a conference”
▪ [“I”, “attending”, “conference”]
▪ User behavior (views, purchases etc.)
▪ Set of purchased items
▪ Orders: Keyboard, Computer, Mouse
▪ [“Keyboard”, “Computer”, “Mouse”]
Frequently Bought TogetherNLP

Data Preparation - Context Separation
▪ Sequences may contain the products
from diverse categories
▪ [“Keyboard”, “Mouse”, “Shoes”, “Socks”]
▪ Sub-sequences may be created depending
on labels as category, brand etc.
▪ [“Keyboard”, “Mouse”] and [“Shoes”,Socks”]
Sub-sequenceSequence

Code Sample for Data Preparation

Word2Vec Parameters
▪ Random Search is applied to restrict a parameter search space
▪ Grid Search is applied to select optimal parameters
▪ Following Word2Vec parameters are optimized
▪ min_count: it is preferable to set lower otherwise coverage will decrease
▪ sample: the most frequent items dominates sequences; it might yield noisy embeddings;
computationally not efficient.
▪ window: the length of context is set to be the maximum length of sequences since order of
items in the sequence is random.
▪ size: tradeoff between network size, storage and computational cost; it is set to be as minimum
as possible without losing the quality of recommendations
▪ iter: default value is very low and thus it is set to be between 50 and 80; model is not trained
well when iter is set to low values;

▪ KNN algorithm is employed to find
the most similar items
▪ Different similarity metrics are used :
Euclidean, Cosine Similarity
▪ Euclidean distance measures the distance between two points and it
is affected by the length of vectors. Thus, it is needed to normalize
vectors in order to obtain more accurate results.
▪ Angle between two vectors determine the similarity of two vectors in
cosine similarity.
Similarity Functions

Offline Metrics
▪ We need simple statistical metrics to be able to check the
performance of the model and to tune parameters
▪ Precision@k
▪ (# of recommended items @k that are relevant) / (# of recommended items @k)
▪ Recall@k
▪ (# of recommended items @k that are relevant) / (total # of relevant items)
▪ HitRate@k
▪ (# of hits @k recommendations ) / (total # of test users)

MLFlow Tracks the Model
▪ It is easy visually inspect the
parameters
▪ Evaluation metrics can be
investigated graphically
▪ It is easy to integrate into the
source code
▪ It is effective for team
collaboration through the
central server

Word2Vec Hyperparameter Tuning

Arithmetic Operations on Embeddings
▪ Is it possible to create new business dimensions using simple
arithmetics on existing product embeddings?
▪ Similarity( AVG(Adidas_Shoes) , AVG(Nike_Shoes)) ≃ 1 ?
▪ Similarity( AVG(Camping tents) , AVG(Outdoor chairs)) ≃ 1 ?
▪ 1_Adidas_Shoe - Adidas_Brand + Nike_Brand ≃ 1_Similar_Nike_Shoe ?
▪ Relevancy is decreasing while entities in higher levels of hierarchy as
categories(Sport, Baby, Women Clothes etc.) are represented using
low level entities as products.

Arithmetic Operations on Embeddings
▪ Brand similarity is relevant if a brand contains homogeneous products
in terms of categories(Upper body clothes, Lower body clothes etc.) .

Implementation Tips
▪ PySpark
▪ Enables to work with any python modelling library through spark to pandas dataframe conversion
▪ Pandas UDFs are very useful for parallelization
▪ Conversion from Spark DF to Pandas DF is still costly in terms of memory in spite of using Arrow

Implementation Tips
▪ Model Quality
▪ Offline metrics, experimental UI and online metrics should be used for quality analysis
▪ Process
▪ Useful to use notebooks in experimental stage but it is preferable not to use in production
▪ Transition from experimental stage to production should have minimum cost
▪ Metric Validation should be a part of the flow, not a background analysis in production phase

Model Serving Layer
▪ Approximate Nearest Neighbour
Search Algorithms
▪ Annoy, Faiss, Hnswlib, ScaNN and many others
▪ Choose the library considering
▪ Open source benchmarks
▪ Programming language
▪ Similarity functions
▪ Distributed Index
▪ Incremental item insertion / deletion
▪ Ability to customize
▪ Our choice
▪ Hnswlib + Custom Post-Processing Layer
http://ann-benchmarks.com/

Model Serving Layer - HNSWLIB
▪ Trade-off between hierarchical navigable small world graph
construction and search parameters
▪ Simple tree, weak search: less indexing time, less memory, less cpu usage, low recall
▪ Simple tree, strong search: less indexing time, less memory, more cpu usage, acceptable recall
▪ Complex tree, weak search: more indexing time, more memory, less cpu usage, high recall
▪ Complex tree, complex search: more indexing time, more memory, high cpu usage (waste), high recall
▪ Consider the following metrics to select optimal parameters
▪ Index size / Memory consumption
▪ Build time
▪ Cpu usage
▪ Query per seconds
▪ Recall

Model Serving Layer - Post Processing
▪ Only similarity search will not be enough
▪ You will need to make some post-processing after retrieving result
▪ Implement your custom solution
▪ Make post-processing in the consuming service
▪ Use metadata and post-process supporting solution
▪ ex: opendistro-for-elasticsearch which supports hnswlib index and brings post-processing
functions
▪ Every solution has it’s own pros. and cons. We implemented our
custom solution which enhances the index with metadata and you can
inject any filtering or ranking methods that you need.

Post Filtering Validation Methods

Experimental UI
▪ Reveal what you need
▪ Variant level exclusions
▪ Category level restrictions and exclusions
▪ Brand level restrictions and exclusions
▪ Price aware filters
▪ Gender filters
▪ Top-N category diverse ranking
▪ Etc.
▪ Implement in serving layer
▪ Experiment again

Model Serving Layer - Performance
▪ Single instance
▪ 8K request per second
▪ Under 1ms (~400µs)
▪ Using assembly code
instead of default
distance function
implementations may
improve indexing and
query performance
considerably
(vectorization)

Model Serving Layer - Results on Production
Two FBT Examples on Production (Shown after add to cart action)

Online Metrics
CTR
CR
Coverage
Diversity
Revenue
Usage Ratio
Order Ratio
▪ Placement Title
Placement Location
Position in Placement
Category Levels
Channel
Time of Week/Day
Gender
DimensionsKey Metrics

Online Metrics
▪ Calculate your overall impact
▪ Make your detailed analysis to increase domain knowledge which leads
to improvement of your recommendations
▪ If you only rely on CTR and CR you may lose the big picture
▪ Popular products and their relatively higher CTRs may put you in a
vicious circle in a narrow space.
▪ You should interpret CR metric differently for different categories.

Take Aways
▪ Use embedding representations in recommendation domain as much
as possible
▪ Word2Vec is easy to use and train (without using GPUs) but tune
parameters wisely and asses offline metrics taking into account your
business requirements.
▪ Be careful when applying arithmetic operations on embeddings
▪ Follow small cycles during the experimental and production stages
▪ Design serving layer considering your scale
▪ Use experimental UI and apply post-filtering for more relevant results
▪ Track online metrics to understand real impact of your solution

Frequently Bought Together Recommendations Based on Embeddings

More Related Content

What's hot

Similar to Frequently Bought Together Recommendations Based on Embeddings

More from Databricks

Recently uploaded

Frequently Bought Together Recommendations Based on Embeddings