Container orchestration and microservices world

Container orchestration and
microservices world
Karol Chrapek
the story about containers
orchestration.

Novomatic Technologies Poland
● R&D center for Novomatic
● was established in 1997 (20 years)
● more than 300 specialists
● focusing on high-tech gaming technologies and entertainment market
● more info here: novomatic-tech.com

Why do we need containers in NTP?
● Unified deployment method.
● Accelerate software development, deployment and shipping processes.
● Simplify cooperation with different teams / companies inside the Novomatic group.
● Reduce the need to maintain dev infrastructure in each project.
● Solve problem with some legacy library and hardware.

Container evolution in NTP
● “Think tank team” experiment with container:
○ speed up CI phase
○ simplify deployment and upgrade processes
○ run them everywhere (local test ;))
● TTT created “Container’s Evelen” and showed a few presentations internally.
● More teams decided to use containers for test purposes.
● A few small projects start using docker in production.
● We needed solution for containers’ platform at scale.
● TTT deployed a first Kubernetes dev custer in NTP.
● “DevOps team” took responsibility for K8S stacks.
● DOT created a new clusters inside NTP.
@Hefzul Bari

Developers needs?
● Easy to run and share with other teams.
● Reduce number of issues forwarded to infrastructure team.
● One orchestration method/tool for local and production environments.
● A platform ready for public clouds.
● Support of legacy apps and their dependencies.
● Learn something new.

Business needs?
● Reducing deployment and scalability windows.
● Run on both classes of hardware: commodity and enterprise.
● The same deployment model for different environments and teams.
● Reducing performance degradation window during an failure.
● All new products should increase environment stability.
● Most of our clients require on-premise solution.

Why did we chose kubernetes?
● We tested different tools and we choose one that suits “best” to our model.
● Currently k8s is container orchestration “standard”.
● All main cloud providers are compatible with kubernetes (GKE, AKS, EKS).
● Some clients own on premise Kubernetes infra, some teams prefer cloud providers but software
deployment method stays the same.
● Approved by development teams and clients.
● Open source software.

Development environments
● previous: one k8s cluster provisioned via custom bash scripts
● now: three two k8s clusters provisioned via Kubespray
○ 8-10 nodes
○ all nodes are virtual machines on Cisco stack
● some developers use Minikube
● sometimes additional test envs are exposed by our clients

PaaS - requirements
Operations:
● multi-datacenter
● high availability
● easy to provisioning
● on demand scalability
● security
Developers:
● config management
● secret management
● service discovery
● blue-green deployment
● tracing
Both:
● telemetry
● logging
● self-healing
● rolling update
@Damien Pollet - flickr

#1 Kubernetes is a distributed platform

#2.1 Kubernetes as a PaaS core
Platform [1]:
- Distribution (55)
- Hosted (34)
- Installer (18)
Others:
- Application definition
& Image Build [2]
- Service Proxy [3]
- Service Mesh [4]
- Network [5]
- Security [6]
- Observability [7]
- Storage [8]
1 7
2
3 4
5
6
8

#3 Kubernetes - cutting edge vs prod grade
API components (1.14) Version
CronJob v1beta1
Ingress v1beta1
PodSecurityPolicy v1beta1
CSI Driver v1beta1

#4 Etcd - replication and consistency
Problems:
● Etcd size sometimes starts growing and grows … [#8009]
● Network glitch reducing etcd cluster availability seriously [#7321]
● Test clientv3 balancer under network partitions, other failures [#8711]
@jevans

#5 Kubernetes API
● CoreDNS crash when API server down [#2629]
● CVE-2018-1002105 [#71411]
● When API server down operators and some sidecars /init containers could crash (always HA)
● Kubernetes scheduler and controller crash when they are connected to localhost [#22846 and
#77764 ]
@jevans

#6 Small deployment and edge computing
● edge computing at Chick-fil-A
● Services overhead
● Deployment and monitoring is not so easy.
● Challenge: Cross cluster connections.

#7 Enforcing default limits for containers
>Ja [2:20 PM]
ale widze ostatnio masz twardą rękę do podziałów zasobów po zespołach :)
ja mysle ze w tym tygodniu poprawie te limity i konfiguracje
….
bo mi trochę głupio, że z prostymi problemami się borykamy:
>Kolega XYZ [3:23 PM]
moja babcia zawsze mówiła, że głupio to jest kraść

#8 Run stateful apps
https://twitter.com/kelseyhightower/status/963413508300812295

#9 Operator helps to manage STS but:
● they are complex,
● mostly support 60-80% of all maintaining tasks,
● chose manage services in cloud or classic orchestration for on-premise solution,
● sometimes sts apps version bump required manual operations.

#10 Persistence volumes and k8s on-premise
● NFS - replication is tricky
● Rook operator [ceph or edgeFS] - complex
● Local volume still in Beta https://kubernetes.io/blog/2018/04/13/local-persistent-volumes-beta/
● Expanding Persistent Volumes Claims still in beta
● Flexvolume and CSI driver

#11 App flapping-> connection reset via
ingress
Symptoms: Active connections reset after 5 minutes.
Root cause:
1. Pod rescheduled (container OOM), new pod == new IP.
2. Service add new endpoint -> nginx configuration reload .
3. Nginx conf reload -> wait 5 minutes (worker-shutdown-timeout)
and kill old worker.
Related issue:#2461
nginx.com

#12 Multitenant and RBAC
● Single tenant and multiple clusters or one multi-tenant cluster.
● Universal permission by resource type.
● No field-level access control.

#13 Namespace - resource isolation ;)
https://xkcd.com/2044/

#14 Network Policy
By default network is “flat” inside Kubernetes ;)
Common network policies:
https://github.com/ahmetb/kubernetes-network-policy-recipes
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
namespace: secondary
name: deny-from-other-namespaces
spec:
podSelector:
matchLabels:
ingress:
- from:
- podSelector: {}

#15 Infrastructure resources and stability

#1 Application deployment
Happy helming:
● The syntax in hard, especially when you start.
● Secret storing required extra plugin. [helm-secrets]
● Umbrella charts are always tricky. [#4490]
● Helm upgrade failed when new objects added [#4871]
● Tiller and RBAC [Tiller was removed from Helm3, discussion here #1918]

#2 Telemetry
If you like a new and fancy solution try prometheus-operator:
● https://github.com/coreos/prometheus-operator
● https://github.com/helm/charts/tree/master/stable/prometheus-operator
Potential problems:
● How to add custom alerts, dashboards and monitoring rules.
● Should we use multiple smaller instances or the big one?
● Where should it be deploying?

#3 Logging
Nothing new: EFK stack do the job but:
● In multi-tenant we should implement elasticsearch document level security:
https://opendistro.github.io/for-elasticsearch/
● Kubernetes logs are still plaintext, not structured.
● Logs unification

#4 Need more ;)
● Service Mesh
● Tracing
● Cross cluster communication
● Infrastructure testing
● Sidecars and init container
● ...

Container orchestration and microservices world

More Related Content

What's hot

Similar to Container orchestration and microservices world

Recently uploaded

Container orchestration and microservices world