Oliver Moser presented on Kubernetes at Telekom Austria Group. They were looking for a way to scale their geo analytics workloads and established a Kubernetes playground in late 2015. Some key requirements they had were to scale workloads in and out based on changing needs, self-heal from failures, load balance workloads, and schedule and manage jobs. They implemented Kubernetes on bare metal servers and used Prometheus for monitoring and logging. They developed some custom tools like Makefiles and Ansible playbooks to help automate common tasks like deploying and managing workloads. They ran into some issues around storage, resource limits, and upgrades but are focusing on testing, high availability, alerting and Helm going forward.
4. What we were looking for
• End of 2015 we established a playground to deal with TV and Geo
analytics
• High level goals/stages:
• Start small
• Proof it works
• Scale out
• Reproduce
• No Hadoop in the beginning J
17. Hack 1: Makefiles FTW
Many alternatives to sed/make:
• Dedicated templating tool (e.g. Jinja)
• Use configuration management (e.g. Ansible)
• Wait for native template support (http://bit.ly/2t7HHQN)
... but...
• Makefiles are easy to understand (e.g. for Operations)
• make is super fast (yeah Ansible I’m talking to you J)
• make just works
18. Hack 2: kubeadm orchestration
• Cluster bootstrapping via kubeadm
• Automation via Ansible
• Easy to add/reset/remove worker nodes
• East to install/switch plugins (e.g. flanneld, Kube UI, Weave...)
19. Hack 2: kubeadm orchestration
• Playbooks are started with... make :-!
20. Hack 3: CLI helpers
• Shortcuts for most
common tasks
(exec, logs, etc)
23. Issue’s we were/are running into
• DeviceMapper http://bit.ly/2uer3OV
• Blocked Tasks (switching from ext4 to XFS helped)
• <defunct> processes à containers stick around forever
• http://bit.ly/2vtcsMX http://bit.ly/2ckFrt7
• No Job history limit à Many Pods stick around à kube gets slow
• Flanneld/CNI IP address pool exhausted http://bit.ly/2t7HHQN
• Upgrade to 1.6 à new RBAC features à Kaboom!
24. Next Steps
• Tests
• HA
• Alerting
• Helm
• Websocket support for our AAA proxy (for k exec)