ZaloPay Merchant Platform deployed Kubernetes on-premise for its microservices. It faced issues with node scaling breaking production, multi-interface configuration, and node joining. Two nodes failed causing downtime and pressure to shutdown. Lessons learned include practicing more, doing chaos engineering, and getting business support for new technologies. Next steps are upgrading components, improving monitoring, and applying ingress and persistence solutions.
13. CI/CD v1
• Gitlab hook when commit with comment “BUILD <ENV>”
• Why don’t we use GitLab Webhook?
• Call Jenkins Pipeline
• Build the code
• Test the code
• Deploy to K8s
• Manual config HAProxy
21. Monitor & Alert
• Monitor dashboard from K8S
• Monitor HAProxy log
• Parse ERR 5XX
• Alert by SMS to tech leader
22. Issues 1
• Scale node break the production farm
• Multi interface because of sticky node
• Kubespray choose default route when no IP in inventory file
23. Issues 2
• Cannot join node which joined before
• Kubeadm installed
• Kubelet cannot start
• How to fix ?
24. Issues 3
• 2 node die
• Product has problem
• Biz pressure: request to shutdown product because of tech
incidents
• Try to fix => more problem
• How to fix ?
25. Lesson learned
• Try to make DEV ~ PROD
• Try to understand the root causes
• Practice & Practice & Practice
• Chaos engineering is VERY IMPORTANT for production
• Need supporting from Biz Owner to apply new technology
26. Next steps
• Upgrade to latest version k8s, tidb
• Consolidate monitor tools
• Make Alert system smarter
• Apply Ingress controller : Nginx Ingress Controller
• Try Persistence Volume: OpenEBS
• Redis cluster solution
• Fully automation CI/CD