Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Kubernetes to scale

884 views

Published on

Debates about scaling can often be abstract. Debaters may not even have genuine scaling issues. Rationales for one strategy over another can be highly subjective preferences rather than borne out of experience. This is definitely not the case for this talk. We will discuss the very real scaling issues at lastminute.com - highlighting not just how Kubernetes helped but also the context around those strategy decisions.

Published in: Technology
  • Be the first to comment

Kubernetes to scale

  1. 1. Kubernetes to Scale michele.orsi@lastminute.com @micheleorsi GDG Cloud - London, 11 January 2017
  2. 2. Started with a monolith ... https://www.flickr.com/photos/southtopia/5702790189
  3. 3. https://www.pexels.com/photo/gray-pebbles-with-green-grass-51168/ ... broken into microservices
  4. 4. Micro-problems at scale ● alignment ● real pipelines ● infrastructure ● resilience ● monitoring ● constraints
  5. 5. An year-long endeavour ● build a new, modern infrastructure ● migrate the search (flight/hotel) product there ... without: ● impacting the business ● throwing away our whole datacenter
  6. 6. How we did that: technology ● company framework ● docker ● kubernetes
  7. 7. How? Teams and peopleHow we did that: team/people https://www.pexels.com/photo/blue-lego-toy-beside-orange-and-white-lego-toy-standing-during-daytime-105822/
  8. 8. APP3-PRODUCTION Kubernetes: our architecture APP2-PRODUCTION APP1-PRODUCTION APP3-PRODUCTION APP2-PRODUCTION APP1-PREVIEW APP3-PRODUCTION APP2-PRODUCTION APP1-DEVELOPMENT APP3-PRODUCTION APP2-PRODUCTION APP1-QA APP3-PRODUCTION APP2-PRODUCTION APP1-STRESSTEST nonproductionproduction
  9. 9. Kubernetes: our architecture APP1-PRODUCTION deployment replica-set POD 3 POD 2 POD 1 production
  10. 10. Kubernetes: our architecture APP1-PRODUCTION deployment replica-set secret configmap POD 3 POD 2 POD 1 production
  11. 11. Kubernetes: our architecture APP1-PRODUCTION deployment replica-set (ingress) path: app1-production.prd.lmn.intra secret configmap POD 3 POD 2 POD 1 production
  12. 12. Kubernetes: our architecture nginx-ingress-ctrl: 80 cluster F5 POD 10.0.0.2 POD 10.0.0.1 nginx-ingress-ctrl: 80 nginx-ingress-ctrl: 80 POD 10.0.0.3POD 10.0.0.4 POD 10.0.0.5 POD 10.0.0.6
  13. 13. APP1-PRODUCTION Kubernetes: our architecture POD collectd production application fluentd
  14. 14. /liveness: ● when tomcat container is up ● when “active/max” threads < threshold /readiness: ● all the startup jobs have run ● no termination request has been received .. ongoing never-ending research .. Self-healing: our choice for resilience
  15. 15. Kubernetes: what’s left outside? ● datastores ● distributed caches (early 2017) ● distributed locking ● pub-sub/queues ● logs and metrics storage
  16. 16. ● zero downtime during rollout ● monitoring in place ● alerting ● centralized logging ● legacy infrastructure to the rescue in case of problem When can you test with production traffic?
  17. 17. ... failure ... at all different levels .. https://www.flickr.com/photos/ghost_of_kuji/2763674926
  18. 18. Main problems ● configuration ● infrastructure ● tools ● manual mistakes ● (external) scalability
  19. 19. There’s light .. at the end https://www.pexels.com/photo/grayscale-photography-of-person-at-the-end-of-tunnel-211816/
  20. 20. Pipeline: a huge step forward microservice = factory.newDeployRequest() .withArtifact(“com.lastminute.application1”,2) lmn_deployCanaryStrategy(microservice,”qa”) lmn_deployStableStrategy(microservice,”preview”) lmn_deployCanaryStrategy(microservice,”production”) pipeline
  21. 21. APP1-PRODUCTION Monitoring: grafana/graphite/nagios cluster graphite n collectd Grafana nagios icons from http://www.flaticon.com
  22. 22. ● lead and migration time ● resilience ● root cause analysis ● speed of deployment ● instant scaling ... benefits
  23. 23. ● 36 bare-metal nodes (only for production cluster) ● 5100 req/sec in the new cluster ● 2M metrics/minute flows ● 35 micro-services migrated in 5 months ○ 3 new micro-services migrated per week ○ 10 minutes to create a new environment ● 11 min to roll-out a new version with 55 instances ○ whole pipeline runs in 16 min Give me the numbers!
  24. 24. Yes, we’re hiring! THANKS www.lastminutegroup.com

×