Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

of

2 Epic Migrations at Flo: Slide 1 2 Epic Migrations at Flo: Slide 2 2 Epic Migrations at Flo: Slide 3 2 Epic Migrations at Flo: Slide 4 2 Epic Migrations at Flo: Slide 5 2 Epic Migrations at Flo: Slide 6 2 Epic Migrations at Flo: Slide 7 2 Epic Migrations at Flo: Slide 8 2 Epic Migrations at Flo: Slide 9 2 Epic Migrations at Flo: Slide 10 2 Epic Migrations at Flo: Slide 11 2 Epic Migrations at Flo: Slide 12 2 Epic Migrations at Flo: Slide 13 2 Epic Migrations at Flo: Slide 14 2 Epic Migrations at Flo: Slide 15 2 Epic Migrations at Flo: Slide 16 2 Epic Migrations at Flo: Slide 17 2 Epic Migrations at Flo: Slide 18 2 Epic Migrations at Flo: Slide 19 2 Epic Migrations at Flo: Slide 20 2 Epic Migrations at Flo: Slide 21 2 Epic Migrations at Flo: Slide 22 2 Epic Migrations at Flo: Slide 23 2 Epic Migrations at Flo: Slide 24 2 Epic Migrations at Flo: Slide 25 2 Epic Migrations at Flo: Slide 26 2 Epic Migrations at Flo: Slide 27 2 Epic Migrations at Flo: Slide 28 2 Epic Migrations at Flo: Slide 29 2 Epic Migrations at Flo: Slide 30 2 Epic Migrations at Flo: Slide 31 2 Epic Migrations at Flo: Slide 32 2 Epic Migrations at Flo: Slide 33 2 Epic Migrations at Flo: Slide 34 2 Epic Migrations at Flo: Slide 35 2 Epic Migrations at Flo: Slide 36 2 Epic Migrations at Flo: Slide 37 2 Epic Migrations at Flo: Slide 38 2 Epic Migrations at Flo: Slide 39 2 Epic Migrations at Flo: Slide 40 2 Epic Migrations at Flo: Slide 41 2 Epic Migrations at Flo: Slide 42 2 Epic Migrations at Flo: Slide 43 2 Epic Migrations at Flo: Slide 44 2 Epic Migrations at Flo: Slide 45 2 Epic Migrations at Flo: Slide 46 2 Epic Migrations at Flo: Slide 47 2 Epic Migrations at Flo: Slide 48 2 Epic Migrations at Flo: Slide 49 2 Epic Migrations at Flo: Slide 50 2 Epic Migrations at Flo: Slide 51 2 Epic Migrations at Flo: Slide 52 2 Epic Migrations at Flo: Slide 53 2 Epic Migrations at Flo: Slide 54 2 Epic Migrations at Flo: Slide 55 2 Epic Migrations at Flo: Slide 56 2 Epic Migrations at Flo: Slide 57 2 Epic Migrations at Flo: Slide 58 2 Epic Migrations at Flo: Slide 59
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0 Likes

Share

Download to read offline

2 Epic Migrations at Flo:

Download to read offline

From hardware to AWS, from EC2 to EKS. What we learned from it?

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

2 Epic Migrations at Flo:

  1. 1. 2 Epic migrations at FLO Dmitry Yackevich, Director of Engineering at Flo Health
  2. 2. ✔ Director of Engineering at Flo Health ✔ DevOps enabler (and sometimes disabler) at Flo, Pandadoc, Targetprocess and Workfusion WHO AM I? Dmitry Yackevich
  3. 3. Context
  4. 4. Migrations are the only mechanism to effectively manage technical debt as your company and code grows. If you don't get effective at software and system migrations, you'll end up languishing in technical debt. And still have to do one later anyway, it's just that it'll probably be a full rewrite. WHY MIGRATIONS MATTER
  5. 5. LONG, LONG TIME AGO 2018 Q1
  6. 6. Flo is an AI-powered health app for women that supports them during an entire reproductive period
  7. 7. NOT JUST PERIOD TRACKER
  8. 8. 1. Backup/restore lasted for one week 2. Constant Outages 3. DB cluster was near capacity limit 4. Changes is hard WE REACH THE LIMIT ON BARE METAL
  9. 9. Heroic mode
  10. 10. ✔ AWS + Terraform ✔ Ansible deployment ✔ Bitbucket Pipelines SOLUTION
  11. 11. TERRAFORM WORKFLOW Branch Master
  12. 12. Typical deployment takes 15 minutes ANSIBLE WORKFLOW
  13. 13. ARCHITECTURE 2019 Q1
  14. 14. AVAILABILITY 2%
  15. 15. RESPONSE TIME
  16. 16. Challenges
  17. 17. PRICE RISES TOO FAST
  18. 18. LOW RESOURCE UTILIZATION
  19. 19. • 30 to 200 employees • 2 to 15 deploy/day • 9 to 120 services in production RAPID GROWTH
  20. 20. Minimal Time to Market for new service is 2 day 15 deployments * 15 minutes= 4 hours/day Manual actions: add ssh keys, permissions, etc. TIME IS MATTER It’s ok for most cases, but for building high-speed process you need to get rid of it
  21. 21. Plan
  22. 22. ✔ Autoscaling ✔ Turn Key Modules ✔ Self Healing ✔ Cost Optimization WE WANT TO ACHIEVE = Containers + immutable infrastructure + k8s
  23. 23. ✔ Based on AWS services ✔ Control Plane as a Service ✔ Automatically provision EC2, ALB, Subnets, Route53 EKS
  24. 24. • Unstable setup • Too complicated • Performance degradation • Dark corners RISKS
  25. 25. • Proof of Concept • Dogfooding • Early adopters • Migrate all SCENARIO
  26. 26. Boring migration
  27. 27. PROOF OF CONCEPT EKS SERVICE
  28. 28. It works! 💪 PROOF OF CONCEPT 2048 GAME
  29. 29. ✔ Jenkins ✔ Sentry ✔ Prometheus ✔ Grafana ✔ Fluentd ✔ PgBouncer DOGFOODING We decided to move to EKS infrastructure services first
  30. 30. ● 35 projects ● 5000 events/minute in spike ● 450 events/minute average DOGFOODING WITH SENTRY
  31. 31. DOGFOODING WITH SENTRY AUTOSCALING
  32. 32. DOGFOODING WITH SENTRY HPA
  33. 33. Migration | Dogfooding HPA and Autoscaling
  34. 34. Migration | Dogfooding HPA and Autoscaling
  35. 35. Migration | Dogfooding HPA and Autoscaling
  36. 36. ✔ Test network performance ✔ Rock solid stability ✔ All database requests go through conneсtion pooler DOGFOODING PGBOUNCER
  37. 37. Migration | Dogfooding PGbouncer
  38. 38. LOADBALANCER FOR ALL
  39. 39. ✔ Test team adoption ✔ Customer facing service ✔ Measure complexity EARLY ADOPTERS
  40. 40. Lesson Learned
  41. 41. • Move service in k8s took at least 1 sprint • Everybody want to use k8s for new services, not for old ones EARLY ADOPTERS
  42. 42. MIGRATION STATEMENT
  43. 43. JAVA MEMORY TRICKS
  44. 44. REQUESTS/LIMITS STRATEGY
  45. 45. Each instance type have hard limit for IP address (and Pods) ✔ .large and have only 29 ip addresses ✔ .xlarge and .2xlarge — 58 ✔ .4xlarge — 234 IP Our winner c5.2xlarge
  46. 46. ● 700 IPs is not enough ● 50 ip for 1 server, so there were only 4 EKS workers in one AZ ● Use /20 instead of /24 Subnets
  47. 47. Results
  48. 48. 40% VS 20% RESOURCE UTILIZATION
  49. 49. Performance became more consistent and stable PERFORMANCE
  50. 50. DYNAMIC UTILIZATION
  51. 51. DYNAMIC UTILIZATION
  52. 52. Decreased from 15 to 3 minutes DEPLOYMENT TIME
  53. 53. CHANGE RATE
  54. 54. ✔ -30% ✔ Predictable growing model ✔ Out scaling factor is utilization, not service count COST OPTIMISATION
  55. 55. ✔ Never Fear ✔ Derisk ✔ Find early adopters ✔ Push it to the end How to run migration?
  56. 56. Q&A
  57. 57. THANK YOU! Join us and contribute to the global health! https://flo.health/careers

From hardware to AWS, from EC2 to EKS. What we learned from it?

Views

Total views

227

On Slideshare

0

From embeds

0

Number of embeds

79

Actions

Downloads

3

Shares

0

Comments

0

Likes

0

×