Kubernetes and Lastminute.com Group: Our Journey Towards Scalability and Agility

Kubernetes and lastminute.com group:
our course towards better scalability
and processes
michele.orsi@lastminute.com
Milan, 25-26 November 2016

lastminute.com group in numbers
40 countries
17 languages
10M
travellers per year*
€ 2.5B GTV*
€ 250M revenue*
43M
users per month*
*data as 31st December 2015
icons from http://www.flaticon.com

A tech company to the core
Tech department: 300+ people
Modules: ~100
Database: 150 schemas, 3300 tables, TB data
Instances: 1400+
Locations: Chiasso, Milan, Madrid, London, Bengaluru

https://www.pexels.com/photo/turtle-walking-on-sand-132936/
“Business thinks developers are slow"

lastminute.com group: an agile company
● Scrum and Kanban
● TDD
● clean code
● continuous integration
● code review
● internal communities

Starting from the monolith ...
https://www.flickr.com/photos/southtopia/5702790189

https://www.pexels.com/photo/gray-pebbles-with-green-grass-51168/
... broken into microservices

The improvements needed
● alignment
● real pipelines
● infrastructure
● resilience
● monitoring
● remove constraints

An year-long endeavour
● build a new, modern infrastructure
● migrate the search (flight/hotel) product there
... without:
● impacting the business
● throwing away our whole datacenter

TODO list
● company framework
● docker
● kubernetes

How? Teams and peopleNew teams
https://www.pexels.com/photo/blue-lego-toy-beside-orange-and-white-lego-toy-standing-during-daytime-105822/

Our infrastructure and technologyOur infrastructure and technology
https://www.pexels.com/photo/colorful-toothed-wheels-171198/

● build once, run everywhere
● externalised configuration
Docker containers

Docker containers
registry.intra/application:v2-090025112016
BASE OS
JAVA SDK
START/STOP SCRIPTS
JAR APPLICATION
● build once, run everywhere

Kubernetes
● independent from OS/hosts
● isolated env, managed at scale
● self-healing
Omega paper: http://research.google.com/pubs/pub41684.html

https://www.pexels.com/photo/red-toy-truck-24619/
“Your infrastructure on wheels”

Kubernetes: physical representation
NODE
1
DOCKER
ETCD
K8S
cluster
FLANNEL
NODE
2
DOCKER
ETCD
K8S
FLANNEL
NODE
28
DOCKER
ETCD
K8S
FLANNEL
...

Kubernetes: logical representation
NAMESPACE1 CPU 10
MEM 40GB
NAMESPACE2 CPU 20
MEM 50GB
NAMESPACE3 CPU 80
MEM 60GB
NAMESPACE4 CPU 5
MEM 5GB
cluster

APP3-PRODUCTION
Kubernetes: our architecture
APP2-PRODUCTION
APP1-PRODUCTION
APP3-PRODUCTION
APP2-PRODUCTION
APP1-PREVIEW
APP3-PRODUCTION
APP2-PRODUCTION
APP1-DEVELOPMENT
APP3-PRODUCTION
APP2-PRODUCTION
APP1-QA
APP3-PRODUCTION
APP2-PRODUCTION
APP1-STRESSTEST
nonproductionproduction

Kubernetes: our architecture and choices
APP1-PRODUCTION
deployment
replica-set
POD
3
POD
2
POD
1
production

APP1-PRODUCTION
deployment
replica-set
secret configmap
POD
3
POD
2
POD
1
production

APP1-PRODUCTION
deployment
replica-set
app1.lastminute.intra
secret configmap
POD
3
POD
2
POD
1
loadbalancer-app1
production

APP1-PRODUCTION
POD
collectd
production
application fluentd

Kubernetes: what’s left outside?
● datastores
● distributed caches
● distributed locking
● pub-sub
● logs and metrics storage

1st try (with test app), it seemed to work
https://www.flickr.com/photos/26516072@N00/2194001232

The self-healing term describes any application,
service, or a system that can discover that it is
not working correctly and, without any human
intervention, make the necessary changes to
restore itself to the normal or designed state.
Self-healing
ref: https://technologyconversations.com/2016/01/26/self-healing-systems

Kubernetes agnostic interfaces
“When a container is dead I will restart it”
“When a container is ready I will forward traffic to it”

Kubernetes probes: liveness & readiness
Two questions for dev:
● when can I consider my
container alive?
● when can I consider my
container ready to receive
traffic?
spec:
containers:
livenessProbe:
httpGet:
path: /liveness
successThreshold: 3
failureThreshold: 2
readinessProbe:
httpGet:
path: /readiness
successThreshold: 3
failureThreshold: 2
deployment.yaml

/liveness:
● when tomcat container is up
● when ratio “active/max” threads are lower than a
threshold
/readiness:
● all the startup jobs have run
● no termination request has been received
.. ongoing never-ending research ..
Our choices: framework - k8s

● zero downtime during rollout
● monitoring in place
● alerting
● centralized logging
● legacy infrastructure to the rescue in case of problem
2nd try (with production traffic)

... failure ... the big one!
https://www.flickr.com/photos/ghost_of_kuji/2763674926

Problems
● configuration
● infrastructure
● tools
● manual mistakes
● (external) scalability

● temporary team focus on objective
● automation
● monitoring
● Go deeper in docker/kubernetes
Another improvement step

Pipeline: a huge step forward
microservice = factory.newDeployRequest()
.withArtifact(“com.lastminute.application1”,2)
lmn_deployCanaryStrategy(microservice,”qa”)
lmn_deployStableStrategy(microservice,”preview”)
lmn_deployCanaryStrategy(microservice,”production”)
pipeline

APP1-PRODUCTION
Monitoring: grafana/graphite/nagios
cluster
graphite
n collectd
Grafana
nagios
icons from http://www.flaticon.com

“Go” deep .. whatever language it takes
https://www.pexels.com/photo/sea-man-person-ocean-2859/

There’s light ..There’s a light .. at the end
https://www.pexels.com/photo/grayscale-photography-of-person-at-the-end-of-tunnel-211816/

● lead and migration time
● resilience
● root cause analysis
● speed of deployment
● instant scaling
... benefits

● 1300 req/sec in the new cluster
● 25 micro-services migrated in 4 months
● 1 week to migrate an application
● 10 minutes to create a new environment
● 11 min to gracefully roll-out a new version with 55
instances
● whole pipeline runs in 16 min
● 1.5M metrics/minute flows
Give me the numbers!

Yes, we’re hiring!
THANKS
www.lastminutegroup.com

Kubernetes and Lastminute.com Group: Our Journey Towards Scalability and Agility

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Kubernetes and Lastminute.com Group: Our Journey Towards Scalability and Agility

Similar to Kubernetes and Lastminute.com Group: Our Journey Towards Scalability and Agility (20)

Recently uploaded

Recently uploaded (20)

Kubernetes and Lastminute.com Group: Our Journey Towards Scalability and Agility