Infrastructure for Image Models
at OLX
Alexey Grigorev
21.05.2019
About me
● Originally from Russia, live in Berlin
● Background: Databases
● Started as Software Engineer
● Masters in TU Berlin
● Data Science since 2013
● Book: Mastering Java for Data Science
● Kaggler in the past
● Now: Data Scientist at OLX
“ML Engineer”
OLX Group
OLX
● Tech hubs:
○ Poznan, Lisbon, Berlin
○ Buenos Aires, Delhi
● 30+ countries
● 200M MAU
● 10M sellers & 30M buyers
● 30M new listings per month
● Live listings:
○ OLX.PL (18M listings)
○ OLX.UA (12.5M listings)
○ OLX.IN (3M listings)
Apollo at OLX
Image uploads per day
● eu-west-1: up to 6.5 m
○ PL: 3 mln
○ UA: 1 mln
● ap-southeast-1: 2.2 m
○ IN: 0.7 mln
● eu-central-1: 1.6 m
● us-west-1: 55k
● total: ~10 m per day
Apollo X - the metadata service
● Data about images:
○ Category: what is on the image
○ Image quality: is the image good
● How to extract metadata?
● With machine learning!
Training
s3
SageMaker
we
Model
Serving
s3
event
listener
Apollo X
Category
Quality
tf-serving
mms
user
redis mysql
ObjectCreated:Put
Apollo bucket
Batch
Batch
AWS + Kubernetes
Sagemaker vs Self-hosted
Sagemaker
● Sagemaker is great for model training and testing in prod
● But it’s expensive for serving millions of requests
● Hard to monitor
● Hard to make Ops happy
Self-hosted (Kubernetes)
● Cheaper
● Fits existing infra
● Metrics, Logs, Alerts
● SREs know how to deal with issues
Image quality
Cover image
Forbidden items
Duplicates
● Duplicates is a problem
● 30% of images are duplicates
● 10% of ads are duplicates
Image hashes
● MD5 - cryptographic hash
● Perceptive hashes:
○ DHash, PHash, WHash (imagehash library)
● Take image
● Change pixel, resize, re-compress
● Different MD5, same phash
s3
ObjectCreate
ES
hashes
Image index
ObjectDelete
IngestorIngestor
Hash calculation
Image Index
Image Index
● Lambdas are super easy to scale
Images per hour
114 prs
Contact information
● http://alexeygrigorev.com & contact@alexeygrigorev.com
● https://github.com/alexeygrigorev
● https://www.linkedin.com/in/agrigorev
OLX is hiring
● https://www.olxgroup.com/search/engineering/all-locations/all-brands

Image models infrastructure at OLX