This document summarizes the development of an image processing API and ML system to meet requirements of processing images in under 2 seconds, with visibility, persistence, scalability, and extensibility. It describes splitting the API and ML into separate services, optimizing processing times from 4 to 1 second through changes like using Kafka, scaling the system, and troubleshooting issues with performance spikes, message brokers, and databases. Lessons learned include distinguishing CPU from IO-bound tasks and the challenges of Go versus Python. The overall goal of under 1 second per request processing time was eventually achieved.
10. Any issues there?
1. Request monitoring (?)
2. Hard to reason GPU usage (varies 3 to 7GB)
3. ~5Gb RAM out of the box (scale?)
4. Redis is in-memory storage (temporary)
5. Threading != Parallel
12. What if we split
API and ML
Part 2. A new hope
13. Requirements
Lalafo case
1. Single image/request processing time: < 2 seconds
2. Visibility
3. Persistence
4. Scalability
5. Make it extendable for new features
a. Price prediction
b. Similarity search
c. Segmentation
6. SDK friendly (well documented, tested etc)
16. Requirements
Lalafo case
1. Single image/request processing time: < 2 seconds
2. Visibility - Decouple request and prediction
3. Persistence
4. Scalability
5. Make it extendable for new features
a. Price prediction
b. Similarity search
c. Segmentation
6. SDK friendly (well documented, tested etc)
17. 2 seconds per request
Part 3. The Empire Strikes Back
18. 4 seconds per request
API Listeners
1 second
0.01 sec
53. Issues to solve
1. Occasional spikes in performance (GC, network latency)
2. Message broker (Kafka rebalancing, offset etc)
3. How to handle DB migrations
4. Something we are not aware of yet
54. Lessons learnt
1. CPU bound tasks != IO bound (¯_(ツ)_/¯)
2. High coupling - low cohesion
3. You need to know how to cook MongoDB
4. Go is not that obvious and library reach as Python
5. Simple != Easier
6. Concurrency != Parallelism (obviously)