As an e-commerce company leading in fashion and lifestyle in the Netherlands, Wehkamp dedicates itself to provide a better shopping experience for customers. Using Spark, the data science team is able to develop various machine-learning projects that improve the shopping experience.
One of the applications is to create a service for retrieving visually similar products, which can then be used to show substitutional products, to build visual recommenders and to improve the overall recommendation system. In this project, Spark is used throughout the entire pipeline: retrieving and processing the image data, training model distributedly with Tensorflow, extracting image features, and computing similarity. In this talk, we are going to demonstrate how Spark and the Databricks enable a small team to unify data and AI workflows, develop a pipeline for visual similarity and train dedicated neural network models.
2. Retrieving visually similar
products for Shopping
Recommendations using Spark
and Tensorflow
Zhichao Zhong, Wehkamp
#UnifiedDataAnalytics #SparkAISummit
5. the online department store for families in the
Netherlands
√
About wehkamp
> 400.000
products
> 500.000
daily visitors
€ 661 million
sales 18/19
67 years’
history
wehkamp: the online department store for families in the Netherlands.
11 million
packages
18/19
6. About wehkamp
1952 - first advertisement 1955 - first catalog 1995 - first steps online 2010 - completely online 2018 - mobile first
2019 -
a great shop
experience
7. the online department store for families in the
Netherlands
√
Data science at wehkamp
Use data science to improve the online shopping experience for customers.
Search ranking
Recommendation
system
Personalization
And many others ...
Visual similarity
8. Visuals are important for shopping, especially for fashion (our largest category).
People look at look-alike items when
shopping.
Visual similarity: to retrieve similar items based on images.
Visual similarity
8
9. Use case: to show substitutes for out-of-stock items in the look.
Use cases
9
Out of
stock
Substitute
10. Use cases
Use case: to show similar items together on the products overview page.
10
11. Use cases
Use case: to recommend similar items for newly onboarded items (the cold-start
problem).
11
12. How to retrieve visually similar items?
Step 1: Extract image embeddings.
Step 2: Search for similar embeddings.
Steps for visual-similarity
12
1 6 3 2 1.....
2 6 3 2 1.....
0 5 2 2 1.....
1 6 3 7 1.....
3 6 3 2 9.....
1 3 8 3 1.....
21
CNN
Similarity
search
13. Image embedding
Image embedding: low-dimensional vector representations of the image
that contains abstract information.
13
512⨉512⨉3
256
13831.....3131
14. Image embedding
Use convolutional neural network (CNN) to extract the embeddings.
14
fully-connected
layer
convolutional/pooling/activation layers prediction
embedding
CNN
15. Transfer learning
Use a pre-trained model? Train a model from scratch?
• Adopt the VGG16 model pre-trained on the ImageNet dataset (natural images).
• Replace the fully-connected layers.
• Train the fully-connected layers on our own dataset.
15
FC layer
4096
layers
from VGG16
FC layer
512
512⨉1
Embedding
224⨉
224⨉
3
Im
age
16. Triplet loss
Data triplet: anchor image, positive image and negative image
Triplet loss:
Similarity is defined by the Euclidean distance.
are the embeddings for anchor, positive and negative images
respectively.
16
AnchorPositive Negative
FaceNet: A Unified Embedding for Face Recognition and Clustering, F. Schroff et al. (2015)
17. Triplet loss
Minimize the triplet loss
17
AnchorPositive Negative
AnchorPositive Negative
Learning
ɑ
18. Siamese network: identical CNNs take two or more inputs.
Siamese network
18
CNN
CNN
Triplet lossCNN
Identical CNNs
19. Data preparation
Similar product images are put in the same group.
Sample triplets:
• sample 2 images from the same group as the anchor and positive images.
• sample 1 image from other groups as the negative image.
3500 images => 56000 triplets
19
AnchorPositive Negative
FaceNet: A Unified Embedding for Face Recognition and Clustering, F. Schroff et al. (2015)
21. Model training
• 50 epochs, 29 hours on a Nvidia K80 GPU
• How can we scale up the model training to
– fit more data,
– fine tune the hyperparameters quickly?
Use distributed training to speed up the training !
21
22. Distributed training
• Distributed training framework: Horovod by Uber.
– Good scaling efficiency.
– Minimal code modification.
• Training API: HorovodRunner on Databricks, integrated with Spark’s
barrier mode.
22
26. How to retrieve visually similar items?
Step 1: extract image embeddings.
• Train a model on our own dataset.
• From single-machine to distributed training.
Step 2: search for similar embeddings.
Steps for visual-similarity
26
2 6 3 2 1.....
0 5 2 2 1.....
1 6 3 7 1.....
3 6 3 2 9.....
1 3 8 3 1.....
21
CNN
Similarity
search
6 3 2..... 11
27. • Brute-force search can be expensive and slow for large size of
high dimensional data.
• We use the approximate similarity search implemented in Spark:
• Hash step: hash similar embeddings into the same buckets
using locality sensitive hashing (LSH).
• Search step: only search for embeddings in the same
buckets with Euclidean distance.
Similar items retrieval
27
Hashing for Similarity Search: A Survey, J. Wang et al. (2014)
28. LSH hashes dimensional vectors with a small distance into the same buckets
with a high probability.
The hash function for Euclidean distance is:
, where v is a random unit vector, r is the bucket length.
Example:
v = [0.44, 0.90], r = 2
x1 = [2.0, 2.0], h(x1) = 1
x2 = [2.0, 3.0], h(x2) = 1
x3 = [0.0, 5.0], h(x3) = 2
Locality sensitive hashing
28
LSH
29. Two parameters:
bucketLength: the length of each hash bucket.
numHashTable: the number of hash tables.
accuracy query performance
bucketLength
numHashTable
Parameters in LSH
29
h1 h2 hn-1 hn
34. Summary
34
1. Visual similarity applications at Wehkamp
2. Embedding CNN model trained on our own dataset.
– Improved accuracy.
– Reduced embedding size.
– Distributed training enabled by Horovod and Databricks.
3. Approximate similarity search on Spark LSH.
4. Future works:
– Higher accuracy enabled by a larger dataset.
– Binary embeddings to speed up the search process.
– Image embeddings as part of product features.
35. DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT