Amazon EKS: the good, the bad, and the ugly


Geoff Flarity, Software Engineer at CashApp (Square), gave a talk covering everything you need to know about EKS, AWS' managed Kubernetes offering at the Kubernetes + Cloud Native meetups in Toronto and Kitchener-Waterloo.

  1. 1. Amazon EKS The good, the bad, and the ugly.
  2. 2. I am... Geoff Flarity (gflarity) Software Engineer Cash App (Square)
  3. 3. Uphill BOTH WAYS ● Kash started using Kubernetes back around 1.2-1.3 ● On GKE ● YOLO!?
  4. 4. About that Cash App... ● Over 15m MAU, December 2018 ○ We define active as making a money movement ● Cash Card GPV: ○ 90m Dec 2017 ○ 250m June 2018 => ● People love us so much they write songs about us!
  5. 5. About that Cash App... Songs written about Cash… ~90
  6. 6. About that Cash App...
  7. 7. Cash App on EKS +
  8. 8. Cash App on EKS ● Check out the comparison chart ● Some of the info is out of date ● This talk will focus on the issues that matter to the Cash App platform
  9. 9. The Good
  10. 10. The Good ● Managed control plane ● Automatic patch updates (security) ● Click to upgrade for major releases ● Yadda...
  11. 11. The Good Google doesn’t run Search/Adsense on GCP. AWS > GCP Also: If your laptop gets owned, your clusters have been owned to.
  12. 12. The Good ● AWS, AWS, AWS ● AWS IAM (Identity and Access Management) ● Temporary credentials for roles ● Multi factor Authentication If your laptop gets owned, has your cluster been owned too?
  13. 13. The Good - Kubernetes On AWS => 63% ● This is *pre* EKS ● Via KOPS and other tooling ● EKS leverages this work, and the cloud vendor support that is baked into Kubernetes (more on this)
  14. 14. The Good ● Everything is free as in speech... and beer* ● No magic, just AWS primitives ● Active community on github ● Fork and customize! * does not include control plane management system
  15. 15. The Bad
  16. 16. The Bad Service Limit/ELB Issues ● Hard cap on number of services is 300 due to firewall limits (in reality MUCH lower) ● Cloud provider specific logic is built into Kubernetes directly currently ● Won’t be separated for a while ● Work-arounds are rather hacky
  17. 17. The Bad ● AWS has great support for private/isolated virtual networking (VPC) ● Well designed, super configurable! ● The Kubernetes API doesn’t use it ● It’s public! ● Well encrypted, but all communication with master still goes over “internet” (private to AWS but still)
  18. 18. The Ugly
  19. 19. The Ugly ● GA (Generally Available) ○ ...BNPR (But Not Production Ready) ● AMI shipped with no docker log rotation ○ But… wasn’t this the image that much of that 63% were using? ○ What where those 63% doing? Anything serious?
  20. 20. The Ugly ● Single kube-dns pod by default ○ Single point of failure for all your communication (internal/external) ● Certain availability zones with in regions don’t have much capacity. But it’s random! ○ Scaling can fail after you’ve set everything up ○ Trial and error unless you have pro support
  21. 21. The Ugly ● Resources are reserved for the system/kubelet ○ If you run out of disk space, kubectl might die silently. ○ Have fun debugging! ● Control plane logging doesn’t ship to automatically somewhere. ○ Have fun debugging!
  22. 22. The Ugly ● AWS-CNI (networking architecture for EKS) didn’t support multiple subnets properly. ○ Wait… how many of that 63% using it? Many/most of these issues have been resolved or will be soon. But much confidence has eroded :(
  23. 23. Questions And More Info capacity-per-subnet--vpc