Clarifai uses NATS as its messaging system between microservices. Some key reasons for choosing NATS include its lightweight footprint, ease of use on Kubernetes, message persistence, queueing capabilities, and active community. NATS has been running in production at Clarifai for over 5 months, handling over 100k messages per day across multiple services. While manual acknowledgements caused some issues initially, NATS has provided 100% uptime and met Clarifai's requirements for microservice communication.
4. Clarifai is a
market leader in
visual recognition
technology
Proven, award-winning technology
Leading computer vision expert
Top VC investors
Matt Zeiler, CEO & Founder
Machine Learning PhD
12. Oreo cookie 0.500 probabilityOreo cookie 0.588 probabilityOreo cookie 0.645 probabilityOreo cookie 0.766 probabilityOreo cookie 0.897 probabilityOreo cookie 0.912 probabilityOreo cookie 0.941 probabilityOreo cookie 0.956 probabilityOreo cookie 0.995 probability
With a few more examples, the AI gets more accurate.
13. Microservices at Clarifai
01• Transition from v1 to v2 microservice architecture
• Benefits:
Decouple functionality
Scale individual services
Parallel development and testing
IPC/intranode communication replaced with the cost of extra network latency
and serialization
14. Kubernetes at Clarifai
01• v1 architecture used a collection of Ansible scripts with AWS AMIs for deployment
• Transition off of virtual machine AMIs to Docker image
• Facilitated by our move to microservices
Many advantages to Kubernetes:
1. Speed up operations – CI/CD
2. Improved automation
3. Simplify management tools (Helm)
4. Improve security
5. Increase productivity
15. Microservices and Messaging
01• Communication among microservices required a messaging middleware
• Already using GRPCs to define services
• Problems: not easily extensible and requires client-side compilation of protobufs
Requirements:
1. Lightweight
2. Easy to configure and deploy on Kubernetes
3. Message persistence
4. Message queueing
5. Active community
16. Deciding on the Messaging System
01• Considered many options Amazon SQS, Kafka, RabbitMQ, NATS
• NATS:
1. Lightweight – Docker image size only a few MB, memory footprint in production less than 50 MB
2. Great documentation and easy to configure and deploy to Kubernetes cluster
3. Message persistence with file store (Kubernetes volumes)
4. Message queueing with exact source ordering (and at-least-once-delivery with NATS streaming)
5. Active Slack community
6. High messaging throughput and minimal latency
7. Written in Go
8. Message replay
18. Use Case of NATS @ Clarifai
01• Job queue for microservice workers
• Both fast and slow subscribers
• Trigger certain actions in services
• Message persistence during rolling continuous deployments
• Controlling the flow of messages to certain services (“pausing” subscriptions)
20. Results
01• Implemented NATS into our backend in three weeks with one service
• In production for 5 months, currently used by five different services and growing
• 100% uptime with NATS
• 100k+ messages sent through NATS per day
21. Lessons Learned
01• Problems with NATS mainly stemmed from how we were using it
• Manual acking caused lots of issues and harder to get right, reverted to automatic acking for many
subscriptions
• Hard to monitor if there are many subscriptions (inject metrics and monitor the queue using separate
service)
Feature Requests
• Dead-letter queue
• Flushing/invalidating messages from specific subscriptions
• Clustering