Kubernetes from scratch at veepee sysadmins days 2019

Kubernetes from scratch @Veepee
SUMMARY
1 Study
Kubernetes components
Tools & exploitation
Network, security, runtime, proxy, ...3
Control plane deployment
4Node architecture
observability, isolation, discovery
2
Study
Kubernetes components
Components
● Control plane
○ Storage (etcd)
○ API
○ Scheduler
○ Controller-manager
● Nodes
○ Container runtime
○ Node agent (kubelet)
○ Service proxy
○ Network agent
● Key-value store
● Raft based distributed storage
● Client to Server & Server to Server TLS support
Project page : https://etcd.io/
Incubating at
Components : storage
Components : API server
● Store data in etcd
● Stateless REST API
● HTTP/2 + TLS
● gRPC support:
○ WATCH events over HTTP
○ Reactive event based triggers on Kubernetes components
Components : Scheduler
● Connected to API server only
● Watch for pod objects
● Select node to run on based on criterias:
○ Hardware (CPU available, CPU architecture, memory available, disk space)
○ (Anti-)Affinity patterns
○ Policy constraints (labels)
● 1 master per quorum (token in etcd)
Components : Controller manager
● Core controller:
○ Node status responses
○ Replication: ensure pod number on replication controllers
○ Endpoints: maintains Endpoints object for Services
○ Namespace: create default Service Account & Tokens
● 1 master per quorum (token in etcd)
Node components
● Container runtime: Run containers (Docker, containerd.io…)
● Node agent : connects to API server to handle containers & volumes
● Service proxy : load balances service IPs to pod endpoints
● Network agent : Connects nodes together (flannel, calico, kube-router…)
Control plane
Deployment
● 3 Kubernetes clusters per datacenter:
○ Benchmark
○ Staging
○ Production
● No cross DC cluster: No DC split brain situation to manage
Datacenter deployment
● 3 etcd per datacenter
○ TLSv1.2 enabled
○ Authentication through TLSv1.2 enabled
○ Hardware : 4 CPU 32GB RAM
○ OS : Debian 10.1
○ Version 3.4 enabled :
■ reduced latency
■ high write performance improvements
■ read not affected by commits
■ Will be the default version to K8S 1.17
■ See : https://kubernetes.io/blog/2019/08/30/announcing-etcd-3-4/
Etcd deployment
● API version: 1.15.x (old clusters) and 1.16.x (new clusters)
● 2 API server load balanced by haproxy (TCP mode)
○ Horizontally scalable
○ Vertically scalable
○ Current setup : 4 CPU 32GB RAM
○ OS : Debian 10.1
● Load balance etcd themselves
○ We discovered a bug in k8s < 1.16.3 when using TLS, ensure you have at least this
version
○ Issue: https://github.com/kubernetes/kubernetes/issues/83028
API server deployment
API server deployment
● Enabled/Enforced features (Admission controllers):
○ LimitRanger: Resource limitation validator
○ NodeRestriction: limit kubelet permissions on node/pod objects
○ PodSecurityPolicy: security policies to run pods
○ PodNodeSelector: limit node selection for pods
● See full list of admission controllers here:
○ https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers
● Enabled extra feature: Secret encryption on etcd in AES256
API server deployment
● 3 nodes per DC
○ Each has scheduler
○ Each has controller manager
○ Hardware: 2 CPU 8GB RAM
○ OS: Debian 10.1
Controller-Manager & scheduler deployment
● Enabled features on controller-manager: all defaults plus
○ BootstrapSigner: authenticate kubelets on cluster join
○ TokenCleaner: clean expired tokens
● Supplementary features on scheduler:
○ NodeRestrictions: restrict pods on some nodes
Controller-Manager & scheduler deployment
Control plane global overview
Node architecture
Network, security, runtime, proxy, ...
Node architecture: container runtime
● Valid choice: Docker (https://www.docker.com/)
○ The default one
○ Known by “everyone” in the container world
○ Owned by a company
○ Simple to use
Node architecture: container runtime
● Valid choices: Containerd (https://containerd.io/)
○ Younger than Docker
○ Extracted from Docker
○ CNCF enabled project
○ Some limitations:
■ No docker API v1!
■ K8S integration poorly documented
Node architecture: container runtime
● Veepee choice: Containerd
○ Supported by CNCF and community
○ Used by Docker as underlying container runtime
○ We use artifactory, Docker API v2 is fully supported
○ Less footprint, less code, lower latency for kubelet
Node architecture: system configuration
● Pod DNS configuration
○ clusterDomain: root DNS name for the pods/services
○ clusterDNS: DNS servers configured on pods
■ except if hostNetwork: true and pod DNS policy is default
● Protect system from pods: Ensure node system daemons can run
■ 128Mio memory reserved
■ 0.2 CPU reserved
■ Disk soft & hard limits
● Soft: don’t allow new pods to run if limit reached
● Hard: evict pods if limit reached
Node architecture: service proxy
● Exposes K8S service IP on nodes to access pods
● Multiple ways
○ IPTables
○ IPVS
○ External Load Balancer (example AWS ELB in layer 4 or layer 7)
● Multiple possibilities
○ Kube-proxy (iptables, ipvs)
○ Kube-router (ipvs)
○ Calico
○ ...
Node architecture: service proxy
● Veepee solution choice: kube-proxy
○ Stay close to Kubernetes distribution: don’t add more complexity
○ No default need for layer 7 load balancing (service type: LoadBalancer), can be
added as extra proxy in the future
○ Next challenge: IPTables vs IPVS
Node architecture: kube-proxy mode
● Kube-proxy: iptables mode
○ Default recommended mode (faster)
○ Works quite well… but:
■ Doesn’t integrate with Debian 10 and upper (thanks for Debian
iptables-nftables tool) => restore legacy iptables mode
■ Has locking problems when multiple programs need it
● https://github.com/weaveworks/weave/issues/3351
● https://github.com/kubernetes/kubernetes/issues/82587
● https://github.com/kubernetes/kubernetes/issues/46103
■ We need kube-proxy and Kubernetes Network Policies
■ We should take care of conntrack :(
Node architecture: kube-proxy mode
● Kube-proxy: ipvs mode
○ Works well technically (no locking issue/hacks!)
○ ipvsadm is a very better friend than iptables -t nat
○ ipvs also chosen by some other tools like kube-router
○ calico performance comparison convinced us
(https://www.projectcalico.org/comparing-kube-proxy-modes-iptables-or-ipvs/)
Node architecture: kube-proxy mode
● Veepee final choice: kube-proxy + IPVS
Node architecture: network layer
● Interconnects nodes
○ Ensure pod to pod and pod to service communication
○ Can be fully private (our choice) or shared with regular network
● Various ways to achieve it
○ Static routing
○ Dynamic routing (generally BGP)
○ VXLan VPN
○ IPIP VPN
● Multiple ways to allocate node CIDRs
○ Statically (enjoy)
○ Dynamically
Node architecture: network layer
Warning, reading this slide can make your network engineers crazy
● Allocate two CIDRs for your cluster
○ 1 for nodes and pods
○ 1 for service IPs
● Don’t be conservative, give a thousands of IPs to K8S, each node
requires a /24
○ CIDR /14 for nodes (up to 1024 nodes)
○ CIDR /16 for services (service IP randomness party)
Node architecture: network layer
● Needs:
○ Each solution must learn the CIDR of current node through API
○ Network mesh setup should be automagic
● Select the right solution
○ Flannel (default recommended one): VXLan, host-gw
○ Kube-router: IPIP or BGP
○ Calico: IPIP
○ WeaveNet: VXLan
Node architecture: network layer
First test: flannel in VXLan
● Works quite well
● Very easy setup
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
● Yes it’s like curl blah | bash
● No we didn’t installed it like this :)
Node architecture: network layer
First test: flannel in VXLan (https://github.com/coreos/flannel)
● Before a big sale, we load tested an app and… very bad network
performance on nodes
○ Iperf shows that the outside network was good, around 9.8Gbps over 10Gbps
○ Node to pod perf was at maximum too
○ Node to node using regular net is around 9.7Gbps
○ Node to node using VXLan is around 3.2Gbps and kernel load is very high
○ Investigation on the recommended way to run VXLan: offload VXLan to network
cards.
○ It’s not possible in our case we are using Libvirt/KVM VMs, discard VXLan
Node architecture: network layer
Second test: kube-router in BGP mode (https://www.kube-router.io/)
● Drops the need of offloading to network card
● Easy setup too
kubectl apply -f https://raw.githubusercontent.com/cloudnativelabs/kube-router/master/daemonset/kube-router-all-service-daemonset.yaml
● Don’t forget to read the yaml and ensure you publish on right cluster :)
● As suspected, using BGP restore the full capacity of the bandwidth
● Other interesting features:
○ Service proxy (IPVS)
○ Network Policy support
○ Network LB using BGP
● Our choice:
○ BGP choice is very nice
○ We can extend the BGP to fabric if needed in the future
○ We need network policy isolation for some sensible apps
○ One binary for both network mesh and policies: less maintenance
Node architecture: network layer
Tools & exploitation
DNS, metrology, logging, ...
Kubernetes is not magic: tooling
With previous setup we have:
● API
● Container scheduling
● Network communication
We have some limits:
● No access from outside
● No DNS resolution
● No metrology/alerting
● Volatile logging on nodes
Tooling: DNS resolution
Two methods:
● External, using host resolv.conf: no DNS for inside cluster
communication, we can use DNS for external resources only
● Internal: inside cluster DNS records, enables service discovery
○ We need it, go ahead
Tooling: DNS resolution
Two main solutions:
● Kube-dns: legacy one, should not be used for new cluster
○ dnsmasq C layer, single thread
○ 3 containers for a single daemon ?
● Coredns: modern one
○ Golang multithreaded implementation (goroutine)
○ 1 container only
● Some benchmarks (from coredns team, be careful)
○ https://coredns.io/2018/11/27/cluster-dns-coredns-vs-kube-dns/
Tooling: DNS resolution
● CoreDNS is the more reasonable choice.
● Our deployment
○ Deployed as Kubernetes deployment
○ Runs on master nodes (3 pods)
○ Configured as default DNS service on all Kubelet
Tooling: Access from outside
Ingress: access from outside of the cluster
Various choices on the market:
● Nginx (the default one)
● Traefik
● Envoy
● Kong
● Ambassador
● Haproxy
● And more...
Tooling: Access from outside
We studied five:
● ambassador: promising but very young
(https://www.getambassador.io/)
● nginx: the OSS model on Nginx is unclear since F5 bought Nginx Inc.
(http://nginx.org/)
● haproxy: mature product but ingress is very young and HTTP/2 and
gRPC too (http://www.haproxy.org/)
● kong: built on the top of Nginx it's not for general purposes but can be a
very nice API gateway (https://konghq.com/kong/)
● Traefik: good licensing, mature and updated regularly
(https://traefik.io/)
Tooling: Access from outside
Because of risks on some products, we benched traefik:
● Kubernetes API ready
● HTTP/2 ready
● TLS/1.3 ready (Veepee minimum: TLS/1.2)
● Scalable & reactive configuration deployments
● TLS certificate reconfiguration in less than 10sec
● TCP/UDP raw balancing (traefik v2)
Tooling: Access from outside
Traefik bench:
● Very good performance in lab:
○ Tested using k6 and ab tools
○ Test backend was a raw golang HTTP service
○ HTTP: Up to 10krps with 2 pods on VM with 1CPU and 2GB RAM
○ HTTPS: Up to 6.3krps with 2 pods on VM with 1CPU and 2GB RAM
○ Scaling pods doesn’t increase performance, anyway it’s sufficient
Tooling: Access from outside
Traefik bench:
● Load Testing with a real product:
○ More than 1krps
○ not so recent dotnet.core app
○ Dotnet.core app doesn’t take care about containers and suffers from some
contention
○ Anyway the rate is sufficient for the sale: go ahead to prod
○ On a big event sale we sold ~32k concert tickets in 1h40 without problems
Tooling: Access from outside
Traefik bench:
● Before production sale:
○ We increase nodes from 2 to 3
○ We increase application size from 2 to 10 instances
● Production sale day (starting at 7am):
○ No incident
○ We sold 32k concert places in 1h40
Tooling: metrology/alerting
Need:
● collect metrics on pods to do nice graphs
Solution:
● A solution to rule them all
Tooling: metrology/alerting
Implementation:
● Pods exposes a /metrics endpoint through their HTTP listener
● Prometheus will scrape it
● Writing prometheus scrapping configuration by hand is painful
● Hopefully comes: https://github.com/coreos/kube-prometheus
+ =
Tooling: metrology/alerting
● Kube-prometheus implementation:
○ HA prometheus instances
○ HA alertmanager instances
○ Grafana for local metrics view (not reusable for something else)
○ Gather node metrics
○ ServiceMonitor Kubernetes API extension object
Tooling: metrology/alerting
Pod discovery
Tooling: metrology/alerting
Veepee ecosystem
integration
Tooling: metrology/alerting
Pod resource
overview
Tooling: metrology/alerting
Kube-prometheus
graphes (+ some
custom)
Tooling: logging
How to retrieve logs properly ?
● Logging is volatile on containers
● On docker hosts: just mount a volume from host and write on it
● On K8S: i don’t know where my container runs, i don’t know the host, the
host doesn’t want me to write on it, help me doctor!
Tooling: logging
● You can prevent open heart surgery in production by knowing the rules
Tooling: logging
● Never write logs on disk
○ if you need it, use a sidecar to read it and don’t forget rotation!
● Write on stdout/stderr in a parsable way
○ Json comes to the rescue: known by every devel language, easy to serialize &
implement
● Choose a software to gather container logs and push them:
○ filebeat
○ fluentd
○ fluentbit
○ logstash
Tooling: logging
● Our choice: fluentd
○ CNCF sponsored
(https://www.cncf.io/announcement/2019/04/11/cncf-announces-fluentd-graduati
on/)
○ Some needed features on fluentd are not in fluentbit
○ Already used by many SRE at Veepee
● Our deployment model: K8S Daemonset
○ Rolling upgrade flexibility
○ Ensure logs are gathered on each running node
○ Ensure configuration is same everywhere
Tooling: logging
Fluentd object deployment
Tooling: logging
Fluentd log ingestion
pipeline
Tooling: client/product isolation
Need:
● Ensure a client or product will not steal CPU/Memory/Disk resources of
another
Two work axis:
● Node level isolation
● Pod level isolation
Tooling: client/product isolation
Work axis: node level
● Ensure a client (tribe) or a product own the underlying node
● Billing per customer
● Resources per customer, then SRE team
Solution:
● Use enforced NodeSelector on namespaces
scheduler.alpha.kubernetes.io/node-selector: k8s.veepee.tech/tribe=foundation,k8s.veepee.tech=platform
○ Pod can be at only be scheduled on a node with at minimum those labels
Tooling: client/product isolation
Work axis: pod level
● Ensure pods are not stealing other pod resources
● Ensure scheduling do the right node choice according to available
resources
● Forbid pod allocation if no resource available (no overcommit)
Solution:
● LimitRanges
Tooling: client/product isolation
Applied LimitRanges
<ADD YOUR TITLE HERE/>
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit, sed do
eiusmod tempor
incididunt ut labore et
dolore magna aliqua.
Ut enim ad minim
veniam, quis nostrud
exercitation
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit, sed do
eiusmod tempor
incididunt ut labore et
dolore magna aliqua.
Ut enim ad minim
veniam, quis nostrud
exercitation
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit, sed do
eiusmod tempor
incididunt ut labore et
dolore magna aliqua.
Ut enim ad minim
veniam, quis nostrud
exercitation
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit, sed do
eiusmod tempor
incididunt ut labore et
dolore magna aliqua.
Ut enim ad minim
veniam, quis nostrud
exercitation
<ADD YOUR TITLE HERE/>
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit, sed do
eiusmod tempor
incididunt ut labore et
dolore magna aliqua.
Ut enim ad minim
veniam, quis nostrud
exercitation
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit, sed do
eiusmod tempor
incididunt ut labore et
dolore magna aliqua.
Ut enim ad minim
veniam, quis nostrud
exercitation
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit, sed do
eiusmod tempor
incididunt ut labore et
dolore magna aliqua.
Ut enim ad minim
veniam, quis nostrud
exercitation
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit, sed do
eiusmod tempor
incididunt ut labore et
dolore magna aliqua.
Ut enim ad minim
veniam, quis nostrud
exercitation
Kubernetes from scratch at veepee   sysadmins days 2019
Questions ?
THANK YOU
1 of 68

Recommended

Large scale overlay networks with ovn: problems and solutions by
Large scale overlay networks with ovn: problems and solutionsLarge scale overlay networks with ovn: problems and solutions
Large scale overlay networks with ovn: problems and solutionsHan Zhou
2.2K views35 slides
OVN - Basics and deep dive by
OVN - Basics and deep diveOVN - Basics and deep dive
OVN - Basics and deep diveTrinath Somanchi
5K views22 slides
OVN DBs HA with scale test by
OVN DBs HA with scale testOVN DBs HA with scale test
OVN DBs HA with scale testAliasgar Ginwala
643 views25 slides
OpenStack Load Balancing Use Cases and Requirements by
OpenStack Load Balancing Use Cases and RequirementsOpenStack Load Balancing Use Cases and Requirements
OpenStack Load Balancing Use Cases and RequirementsJohn Gruber
2.8K views14 slides
Debugging with-wireshark-niels-de-vos by
Debugging with-wireshark-niels-de-vosDebugging with-wireshark-niels-de-vos
Debugging with-wireshark-niels-de-vosGluster.org
576 views20 slides
Vigor 3910 docker firmware quick start by
Vigor 3910 docker firmware quick startVigor 3910 docker firmware quick start
Vigor 3910 docker firmware quick startJimmy Tu
118 views9 slides

More Related Content

What's hot

TC Flower Offload by
TC Flower OffloadTC Flower Offload
TC Flower OffloadNetronome
4.1K views13 slides
Cilium - BPF & XDP for containers by
Cilium - BPF & XDP for containersCilium - BPF & XDP for containers
Cilium - BPF & XDP for containersThomas Graf
1.7K views15 slides
Deep dive into highly available open stack architecture openstack summit va... by
Deep dive into highly available open stack architecture   openstack summit va...Deep dive into highly available open stack architecture   openstack summit va...
Deep dive into highly available open stack architecture openstack summit va...Arthur Berezin
13.4K views58 slides
LAS16-507: LXC support in LAVA by
LAS16-507: LXC support in LAVALAS16-507: LXC support in LAVA
LAS16-507: LXC support in LAVALinaro
518 views12 slides
Achieving the Ultimate Performance with KVM by
Achieving the Ultimate Performance with KVMAchieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVMdata://disrupted®
42 views45 slides
LAS16-211: Using LAVA V2 for advanced KVM testing by
LAS16-211: Using LAVA V2 for advanced KVM testingLAS16-211: Using LAVA V2 for advanced KVM testing
LAS16-211: Using LAVA V2 for advanced KVM testingLinaro
432 views13 slides

What's hot(20)

TC Flower Offload by Netronome
TC Flower OffloadTC Flower Offload
TC Flower Offload
Netronome4.1K views
Cilium - BPF & XDP for containers by Thomas Graf
Cilium - BPF & XDP for containersCilium - BPF & XDP for containers
Cilium - BPF & XDP for containers
Thomas Graf1.7K views
Deep dive into highly available open stack architecture openstack summit va... by Arthur Berezin
Deep dive into highly available open stack architecture   openstack summit va...Deep dive into highly available open stack architecture   openstack summit va...
Deep dive into highly available open stack architecture openstack summit va...
Arthur Berezin13.4K views
LAS16-507: LXC support in LAVA by Linaro
LAS16-507: LXC support in LAVALAS16-507: LXC support in LAVA
LAS16-507: LXC support in LAVA
Linaro518 views
LAS16-211: Using LAVA V2 for advanced KVM testing by Linaro
LAS16-211: Using LAVA V2 for advanced KVM testingLAS16-211: Using LAVA V2 for advanced KVM testing
LAS16-211: Using LAVA V2 for advanced KVM testing
Linaro432 views
Kuryr & Fuxi: OpenStack networking and storage for Docker Swarm containers by Antoni Segura Puimedon
Kuryr & Fuxi: OpenStack networking and storage for Docker Swarm containersKuryr & Fuxi: OpenStack networking and storage for Docker Swarm containers
Kuryr & Fuxi: OpenStack networking and storage for Docker Swarm containers
Gluster wireshark niels_de_vos by Gluster.org
Gluster wireshark niels_de_vosGluster wireshark niels_de_vos
Gluster wireshark niels_de_vos
Gluster.org657 views
OpenStack Cinder Overview - Havana Release by Avishay Traeger
OpenStack Cinder Overview - Havana ReleaseOpenStack Cinder Overview - Havana Release
OpenStack Cinder Overview - Havana Release
Avishay Traeger9K views
Writing the Container Network Interface(CNI) plugin in golang by HungWei Chiu
Writing the Container Network Interface(CNI) plugin in golangWriting the Container Network Interface(CNI) plugin in golang
Writing the Container Network Interface(CNI) plugin in golang
HungWei Chiu2.4K views
LCE13: Virtualization Forum by Linaro
LCE13: Virtualization ForumLCE13: Virtualization Forum
LCE13: Virtualization Forum
Linaro4.8K views
Comparison of existing cni plugins for kubernetes by Adam Hamsik
Comparison of existing cni plugins for kubernetesComparison of existing cni plugins for kubernetes
Comparison of existing cni plugins for kubernetes
Adam Hamsik411 views
DPDK Support for New HW Offloads by Netronome
DPDK Support for New HW OffloadsDPDK Support for New HW Offloads
DPDK Support for New HW Offloads
Netronome842 views
Baker: Scaling OVN with Kubernetes API Server by Han Zhou
Baker: Scaling OVN with Kubernetes API ServerBaker: Scaling OVN with Kubernetes API Server
Baker: Scaling OVN with Kubernetes API Server
Han Zhou356 views
Bsdtw17: lightning talks/wip sessions by Scott Tsai
Bsdtw17: lightning talks/wip sessionsBsdtw17: lightning talks/wip sessions
Bsdtw17: lightning talks/wip sessions
Scott Tsai43 views
Open Source Backends for OpenStack Neutron by mestery
Open Source Backends for OpenStack NeutronOpen Source Backends for OpenStack Neutron
Open Source Backends for OpenStack Neutron
mestery6.1K views
OVN Controller Incremental Processing by Han Zhou
OVN Controller Incremental ProcessingOVN Controller Incremental Processing
OVN Controller Incremental Processing
Han Zhou813 views
20160401 Gluster-roadmap by Gluster.org
20160401 Gluster-roadmap20160401 Gluster-roadmap
20160401 Gluster-roadmap
Gluster.org519 views

Similar to Kubernetes from scratch at veepee sysadmins days 2019

Kubernetes @ Squarespace (SRE Portland Meetup October 2017) by
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
237 views51 slides
Kubernetes @ Squarespace: Kubernetes in the Datacenter by
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKevin Lynch
1.2K views53 slides
4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes by
4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes
4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetesJuraj Hantak
1K views21 slides
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB... by
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...javier ramirez
18 views43 slides
Introduction to Container Storage Interface (CSI) by
Introduction to Container Storage Interface (CSI)Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)Idan Atias
213 views37 slides
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra... by
 Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra... Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...HostedbyConfluent
797 views39 slides

Similar to Kubernetes from scratch at veepee sysadmins days 2019(20)

Kubernetes @ Squarespace (SRE Portland Meetup October 2017) by Kevin Lynch
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kevin Lynch237 views
Kubernetes @ Squarespace: Kubernetes in the Datacenter by Kevin Lynch
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kevin Lynch1.2K views
4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes by Juraj Hantak
4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes
4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes
Juraj Hantak1K views
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB... by javier ramirez
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
javier ramirez18 views
Introduction to Container Storage Interface (CSI) by Idan Atias
Introduction to Container Storage Interface (CSI)Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)
Idan Atias213 views
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra... by HostedbyConfluent
 Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra... Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
HostedbyConfluent797 views
Deep dive into Kubernetes Networking by Sreenivas Makam
Deep dive into Kubernetes NetworkingDeep dive into Kubernetes Networking
Deep dive into Kubernetes Networking
Sreenivas Makam9.3K views
Cilium - BPF & XDP for containers by Docker, Inc.
 Cilium - BPF & XDP for containers Cilium - BPF & XDP for containers
Cilium - BPF & XDP for containers
Docker, Inc.5.7K views
Cilium - Fast IPv6 Container Networking with BPF and XDP by Thomas Graf
Cilium - Fast IPv6 Container Networking with BPF and XDPCilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDP
Thomas Graf8.6K views
2021.02 new in Ceph Pacific Dashboard by Ceph Community
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
Ceph Community 338 views
NetflixOSS Meetup season 3 episode 1 by Ruslan Meshenberg
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
Ruslan Meshenberg21.3K views
Deploying WSO2 Middleware on Kubernetes by Imesh Gunaratne
Deploying WSO2 Middleware on KubernetesDeploying WSO2 Middleware on Kubernetes
Deploying WSO2 Middleware on Kubernetes
Imesh Gunaratne4.1K views
Kubernetes and Cloud Native Update Q4 2018 by CloudOps2005
Kubernetes and Cloud Native Update Q4 2018Kubernetes and Cloud Native Update Q4 2018
Kubernetes and Cloud Native Update Q4 2018
CloudOps2005512 views
Rohit Yadav - The future of the CloudStack Virtual Router by ShapeBlue
Rohit Yadav - The future of the CloudStack Virtual RouterRohit Yadav - The future of the CloudStack Virtual Router
Rohit Yadav - The future of the CloudStack Virtual Router
ShapeBlue398 views
CloudStack In Production by Clayton Weise
CloudStack In ProductionCloudStack In Production
CloudStack In Production
Clayton Weise4.1K views
Kafka on Kubernetes—From Evaluation to Production at Intuit by confluent
Kafka on Kubernetes—From Evaluation to Production at Intuit Kafka on Kubernetes—From Evaluation to Production at Intuit
Kafka on Kubernetes—From Evaluation to Production at Intuit
confluent1.8K views
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month by Nicolas Brousse
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
Nicolas Brousse12.8K views
Container orchestration and microservices world by Karol Chrapek
Container orchestration and microservices worldContainer orchestration and microservices world
Container orchestration and microservices world
Karol Chrapek191 views

Recently uploaded

Melek BEN MAHMOUD.pdf by
Melek BEN MAHMOUD.pdfMelek BEN MAHMOUD.pdf
Melek BEN MAHMOUD.pdfMelekBenMahmoud
14 views1 slide
.conf Go 2023 - Data analysis as a routine by
.conf Go 2023 - Data analysis as a routine.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routineSplunk
90 views12 slides
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Safe Software
91 views86 slides
RADIUS-Omnichannel Interaction System by
RADIUS-Omnichannel Interaction SystemRADIUS-Omnichannel Interaction System
RADIUS-Omnichannel Interaction SystemRADIUS
14 views21 slides
Special_edition_innovator_2023.pdf by
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdfWillDavies22
14 views6 slides
Five Things You SHOULD Know About Postman by
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About PostmanPostman
25 views43 slides

Recently uploaded(20)

.conf Go 2023 - Data analysis as a routine by Splunk
.conf Go 2023 - Data analysis as a routine.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routine
Splunk90 views
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software91 views
RADIUS-Omnichannel Interaction System by RADIUS
RADIUS-Omnichannel Interaction SystemRADIUS-Omnichannel Interaction System
RADIUS-Omnichannel Interaction System
RADIUS14 views
Special_edition_innovator_2023.pdf by WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2214 views
Five Things You SHOULD Know About Postman by Postman
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman25 views
Emerging & Future Technology - How to Prepare for the Next 10 Years of Radica... by NUS-ISS
Emerging & Future Technology - How to Prepare for the Next 10 Years of Radica...Emerging & Future Technology - How to Prepare for the Next 10 Years of Radica...
Emerging & Future Technology - How to Prepare for the Next 10 Years of Radica...
NUS-ISS15 views
SAP Automation Using Bar Code and FIORI.pdf by Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf
Perth MeetUp November 2023 by Michael Price
Perth MeetUp November 2023 Perth MeetUp November 2023
Perth MeetUp November 2023
Michael Price12 views
Future of Learning - Yap Aye Wee.pdf by NUS-ISS
Future of Learning - Yap Aye Wee.pdfFuture of Learning - Yap Aye Wee.pdf
Future of Learning - Yap Aye Wee.pdf
NUS-ISS38 views
How the World's Leading Independent Automotive Distributor is Reinventing Its... by NUS-ISS
How the World's Leading Independent Automotive Distributor is Reinventing Its...How the World's Leading Independent Automotive Distributor is Reinventing Its...
How the World's Leading Independent Automotive Distributor is Reinventing Its...
NUS-ISS15 views
Transcript: The Details of Description Techniques tips and tangents on altern... by BookNet Canada
Transcript: The Details of Description Techniques tips and tangents on altern...Transcript: The Details of Description Techniques tips and tangents on altern...
Transcript: The Details of Description Techniques tips and tangents on altern...
BookNet Canada119 views
The details of description: Techniques, tips, and tangents on alternative tex... by BookNet Canada
The details of description: Techniques, tips, and tangents on alternative tex...The details of description: Techniques, tips, and tangents on alternative tex...
The details of description: Techniques, tips, and tangents on alternative tex...
BookNet Canada110 views
Empathic Computing: Delivering the Potential of the Metaverse by Mark Billinghurst
Empathic Computing: Delivering  the Potential of the MetaverseEmpathic Computing: Delivering  the Potential of the Metaverse
Empathic Computing: Delivering the Potential of the Metaverse
Mark Billinghurst449 views
Combining Orchestration and Choreography for a Clean Architecture by ThomasHeinrichs1
Combining Orchestration and Choreography for a Clean ArchitectureCombining Orchestration and Choreography for a Clean Architecture
Combining Orchestration and Choreography for a Clean Architecture
ThomasHeinrichs168 views

Kubernetes from scratch at veepee sysadmins days 2019

  • 2. SUMMARY 1 Study Kubernetes components Tools & exploitation Network, security, runtime, proxy, ...3 Control plane deployment 4Node architecture observability, isolation, discovery 2
  • 4. Components ● Control plane ○ Storage (etcd) ○ API ○ Scheduler ○ Controller-manager ● Nodes ○ Container runtime ○ Node agent (kubelet) ○ Service proxy ○ Network agent
  • 5. ● Key-value store ● Raft based distributed storage ● Client to Server & Server to Server TLS support Project page : https://etcd.io/ Incubating at Components : storage
  • 6. Components : API server ● Store data in etcd ● Stateless REST API ● HTTP/2 + TLS ● gRPC support: ○ WATCH events over HTTP ○ Reactive event based triggers on Kubernetes components
  • 7. Components : Scheduler ● Connected to API server only ● Watch for pod objects ● Select node to run on based on criterias: ○ Hardware (CPU available, CPU architecture, memory available, disk space) ○ (Anti-)Affinity patterns ○ Policy constraints (labels) ● 1 master per quorum (token in etcd)
  • 8. Components : Controller manager ● Core controller: ○ Node status responses ○ Replication: ensure pod number on replication controllers ○ Endpoints: maintains Endpoints object for Services ○ Namespace: create default Service Account & Tokens ● 1 master per quorum (token in etcd)
  • 9. Node components ● Container runtime: Run containers (Docker, containerd.io…) ● Node agent : connects to API server to handle containers & volumes ● Service proxy : load balances service IPs to pod endpoints ● Network agent : Connects nodes together (flannel, calico, kube-router…)
  • 11. ● 3 Kubernetes clusters per datacenter: ○ Benchmark ○ Staging ○ Production ● No cross DC cluster: No DC split brain situation to manage Datacenter deployment
  • 12. ● 3 etcd per datacenter ○ TLSv1.2 enabled ○ Authentication through TLSv1.2 enabled ○ Hardware : 4 CPU 32GB RAM ○ OS : Debian 10.1 ○ Version 3.4 enabled : ■ reduced latency ■ high write performance improvements ■ read not affected by commits ■ Will be the default version to K8S 1.17 ■ See : https://kubernetes.io/blog/2019/08/30/announcing-etcd-3-4/ Etcd deployment
  • 13. ● API version: 1.15.x (old clusters) and 1.16.x (new clusters) ● 2 API server load balanced by haproxy (TCP mode) ○ Horizontally scalable ○ Vertically scalable ○ Current setup : 4 CPU 32GB RAM ○ OS : Debian 10.1 ● Load balance etcd themselves ○ We discovered a bug in k8s < 1.16.3 when using TLS, ensure you have at least this version ○ Issue: https://github.com/kubernetes/kubernetes/issues/83028 API server deployment
  • 15. ● Enabled/Enforced features (Admission controllers): ○ LimitRanger: Resource limitation validator ○ NodeRestriction: limit kubelet permissions on node/pod objects ○ PodSecurityPolicy: security policies to run pods ○ PodNodeSelector: limit node selection for pods ● See full list of admission controllers here: ○ https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers ● Enabled extra feature: Secret encryption on etcd in AES256 API server deployment
  • 16. ● 3 nodes per DC ○ Each has scheduler ○ Each has controller manager ○ Hardware: 2 CPU 8GB RAM ○ OS: Debian 10.1 Controller-Manager & scheduler deployment
  • 17. ● Enabled features on controller-manager: all defaults plus ○ BootstrapSigner: authenticate kubelets on cluster join ○ TokenCleaner: clean expired tokens ● Supplementary features on scheduler: ○ NodeRestrictions: restrict pods on some nodes Controller-Manager & scheduler deployment
  • 20. Node architecture: container runtime ● Valid choice: Docker (https://www.docker.com/) ○ The default one ○ Known by “everyone” in the container world ○ Owned by a company ○ Simple to use
  • 21. Node architecture: container runtime ● Valid choices: Containerd (https://containerd.io/) ○ Younger than Docker ○ Extracted from Docker ○ CNCF enabled project ○ Some limitations: ■ No docker API v1! ■ K8S integration poorly documented
  • 22. Node architecture: container runtime ● Veepee choice: Containerd ○ Supported by CNCF and community ○ Used by Docker as underlying container runtime ○ We use artifactory, Docker API v2 is fully supported ○ Less footprint, less code, lower latency for kubelet
  • 23. Node architecture: system configuration ● Pod DNS configuration ○ clusterDomain: root DNS name for the pods/services ○ clusterDNS: DNS servers configured on pods ■ except if hostNetwork: true and pod DNS policy is default ● Protect system from pods: Ensure node system daemons can run ■ 128Mio memory reserved ■ 0.2 CPU reserved ■ Disk soft & hard limits ● Soft: don’t allow new pods to run if limit reached ● Hard: evict pods if limit reached
  • 24. Node architecture: service proxy ● Exposes K8S service IP on nodes to access pods ● Multiple ways ○ IPTables ○ IPVS ○ External Load Balancer (example AWS ELB in layer 4 or layer 7) ● Multiple possibilities ○ Kube-proxy (iptables, ipvs) ○ Kube-router (ipvs) ○ Calico ○ ...
  • 25. Node architecture: service proxy ● Veepee solution choice: kube-proxy ○ Stay close to Kubernetes distribution: don’t add more complexity ○ No default need for layer 7 load balancing (service type: LoadBalancer), can be added as extra proxy in the future ○ Next challenge: IPTables vs IPVS
  • 26. Node architecture: kube-proxy mode ● Kube-proxy: iptables mode ○ Default recommended mode (faster) ○ Works quite well… but: ■ Doesn’t integrate with Debian 10 and upper (thanks for Debian iptables-nftables tool) => restore legacy iptables mode ■ Has locking problems when multiple programs need it ● https://github.com/weaveworks/weave/issues/3351 ● https://github.com/kubernetes/kubernetes/issues/82587 ● https://github.com/kubernetes/kubernetes/issues/46103 ■ We need kube-proxy and Kubernetes Network Policies ■ We should take care of conntrack :(
  • 27. Node architecture: kube-proxy mode ● Kube-proxy: ipvs mode ○ Works well technically (no locking issue/hacks!) ○ ipvsadm is a very better friend than iptables -t nat ○ ipvs also chosen by some other tools like kube-router ○ calico performance comparison convinced us (https://www.projectcalico.org/comparing-kube-proxy-modes-iptables-or-ipvs/)
  • 28. Node architecture: kube-proxy mode ● Veepee final choice: kube-proxy + IPVS
  • 29. Node architecture: network layer ● Interconnects nodes ○ Ensure pod to pod and pod to service communication ○ Can be fully private (our choice) or shared with regular network ● Various ways to achieve it ○ Static routing ○ Dynamic routing (generally BGP) ○ VXLan VPN ○ IPIP VPN ● Multiple ways to allocate node CIDRs ○ Statically (enjoy) ○ Dynamically
  • 30. Node architecture: network layer Warning, reading this slide can make your network engineers crazy ● Allocate two CIDRs for your cluster ○ 1 for nodes and pods ○ 1 for service IPs ● Don’t be conservative, give a thousands of IPs to K8S, each node requires a /24 ○ CIDR /14 for nodes (up to 1024 nodes) ○ CIDR /16 for services (service IP randomness party)
  • 31. Node architecture: network layer ● Needs: ○ Each solution must learn the CIDR of current node through API ○ Network mesh setup should be automagic ● Select the right solution ○ Flannel (default recommended one): VXLan, host-gw ○ Kube-router: IPIP or BGP ○ Calico: IPIP ○ WeaveNet: VXLan
  • 32. Node architecture: network layer First test: flannel in VXLan ● Works quite well ● Very easy setup kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml ● Yes it’s like curl blah | bash ● No we didn’t installed it like this :)
  • 33. Node architecture: network layer First test: flannel in VXLan (https://github.com/coreos/flannel) ● Before a big sale, we load tested an app and… very bad network performance on nodes ○ Iperf shows that the outside network was good, around 9.8Gbps over 10Gbps ○ Node to pod perf was at maximum too ○ Node to node using regular net is around 9.7Gbps ○ Node to node using VXLan is around 3.2Gbps and kernel load is very high ○ Investigation on the recommended way to run VXLan: offload VXLan to network cards. ○ It’s not possible in our case we are using Libvirt/KVM VMs, discard VXLan
  • 34. Node architecture: network layer Second test: kube-router in BGP mode (https://www.kube-router.io/) ● Drops the need of offloading to network card ● Easy setup too kubectl apply -f https://raw.githubusercontent.com/cloudnativelabs/kube-router/master/daemonset/kube-router-all-service-daemonset.yaml ● Don’t forget to read the yaml and ensure you publish on right cluster :) ● As suspected, using BGP restore the full capacity of the bandwidth ● Other interesting features: ○ Service proxy (IPVS) ○ Network Policy support ○ Network LB using BGP
  • 35. ● Our choice: ○ BGP choice is very nice ○ We can extend the BGP to fabric if needed in the future ○ We need network policy isolation for some sensible apps ○ One binary for both network mesh and policies: less maintenance Node architecture: network layer
  • 36. Tools & exploitation DNS, metrology, logging, ...
  • 37. Kubernetes is not magic: tooling With previous setup we have: ● API ● Container scheduling ● Network communication We have some limits: ● No access from outside ● No DNS resolution ● No metrology/alerting ● Volatile logging on nodes
  • 38. Tooling: DNS resolution Two methods: ● External, using host resolv.conf: no DNS for inside cluster communication, we can use DNS for external resources only ● Internal: inside cluster DNS records, enables service discovery ○ We need it, go ahead
  • 39. Tooling: DNS resolution Two main solutions: ● Kube-dns: legacy one, should not be used for new cluster ○ dnsmasq C layer, single thread ○ 3 containers for a single daemon ? ● Coredns: modern one ○ Golang multithreaded implementation (goroutine) ○ 1 container only ● Some benchmarks (from coredns team, be careful) ○ https://coredns.io/2018/11/27/cluster-dns-coredns-vs-kube-dns/
  • 40. Tooling: DNS resolution ● CoreDNS is the more reasonable choice. ● Our deployment ○ Deployed as Kubernetes deployment ○ Runs on master nodes (3 pods) ○ Configured as default DNS service on all Kubelet
  • 41. Tooling: Access from outside Ingress: access from outside of the cluster Various choices on the market: ● Nginx (the default one) ● Traefik ● Envoy ● Kong ● Ambassador ● Haproxy ● And more...
  • 42. Tooling: Access from outside We studied five: ● ambassador: promising but very young (https://www.getambassador.io/) ● nginx: the OSS model on Nginx is unclear since F5 bought Nginx Inc. (http://nginx.org/) ● haproxy: mature product but ingress is very young and HTTP/2 and gRPC too (http://www.haproxy.org/) ● kong: built on the top of Nginx it's not for general purposes but can be a very nice API gateway (https://konghq.com/kong/) ● Traefik: good licensing, mature and updated regularly (https://traefik.io/)
  • 43. Tooling: Access from outside Because of risks on some products, we benched traefik: ● Kubernetes API ready ● HTTP/2 ready ● TLS/1.3 ready (Veepee minimum: TLS/1.2) ● Scalable & reactive configuration deployments ● TLS certificate reconfiguration in less than 10sec ● TCP/UDP raw balancing (traefik v2)
  • 44. Tooling: Access from outside Traefik bench: ● Very good performance in lab: ○ Tested using k6 and ab tools ○ Test backend was a raw golang HTTP service ○ HTTP: Up to 10krps with 2 pods on VM with 1CPU and 2GB RAM ○ HTTPS: Up to 6.3krps with 2 pods on VM with 1CPU and 2GB RAM ○ Scaling pods doesn’t increase performance, anyway it’s sufficient
  • 45. Tooling: Access from outside Traefik bench: ● Load Testing with a real product: ○ More than 1krps ○ not so recent dotnet.core app ○ Dotnet.core app doesn’t take care about containers and suffers from some contention ○ Anyway the rate is sufficient for the sale: go ahead to prod ○ On a big event sale we sold ~32k concert tickets in 1h40 without problems
  • 46. Tooling: Access from outside Traefik bench: ● Before production sale: ○ We increase nodes from 2 to 3 ○ We increase application size from 2 to 10 instances ● Production sale day (starting at 7am): ○ No incident ○ We sold 32k concert places in 1h40
  • 47. Tooling: metrology/alerting Need: ● collect metrics on pods to do nice graphs Solution: ● A solution to rule them all
  • 48. Tooling: metrology/alerting Implementation: ● Pods exposes a /metrics endpoint through their HTTP listener ● Prometheus will scrape it ● Writing prometheus scrapping configuration by hand is painful ● Hopefully comes: https://github.com/coreos/kube-prometheus + =
  • 49. Tooling: metrology/alerting ● Kube-prometheus implementation: ○ HA prometheus instances ○ HA alertmanager instances ○ Grafana for local metrics view (not reusable for something else) ○ Gather node metrics ○ ServiceMonitor Kubernetes API extension object
  • 54. Tooling: logging How to retrieve logs properly ? ● Logging is volatile on containers ● On docker hosts: just mount a volume from host and write on it ● On K8S: i don’t know where my container runs, i don’t know the host, the host doesn’t want me to write on it, help me doctor!
  • 55. Tooling: logging ● You can prevent open heart surgery in production by knowing the rules
  • 56. Tooling: logging ● Never write logs on disk ○ if you need it, use a sidecar to read it and don’t forget rotation! ● Write on stdout/stderr in a parsable way ○ Json comes to the rescue: known by every devel language, easy to serialize & implement ● Choose a software to gather container logs and push them: ○ filebeat ○ fluentd ○ fluentbit ○ logstash
  • 57. Tooling: logging ● Our choice: fluentd ○ CNCF sponsored (https://www.cncf.io/announcement/2019/04/11/cncf-announces-fluentd-graduati on/) ○ Some needed features on fluentd are not in fluentbit ○ Already used by many SRE at Veepee ● Our deployment model: K8S Daemonset ○ Rolling upgrade flexibility ○ Ensure logs are gathered on each running node ○ Ensure configuration is same everywhere
  • 59. Tooling: logging Fluentd log ingestion pipeline
  • 60. Tooling: client/product isolation Need: ● Ensure a client or product will not steal CPU/Memory/Disk resources of another Two work axis: ● Node level isolation ● Pod level isolation
  • 61. Tooling: client/product isolation Work axis: node level ● Ensure a client (tribe) or a product own the underlying node ● Billing per customer ● Resources per customer, then SRE team Solution: ● Use enforced NodeSelector on namespaces scheduler.alpha.kubernetes.io/node-selector: k8s.veepee.tech/tribe=foundation,k8s.veepee.tech=platform ○ Pod can be at only be scheduled on a node with at minimum those labels
  • 62. Tooling: client/product isolation Work axis: pod level ● Ensure pods are not stealing other pod resources ● Ensure scheduling do the right node choice according to available resources ● Forbid pod allocation if no resource available (no overcommit) Solution: ● LimitRanges
  • 64. <ADD YOUR TITLE HERE/> Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation
  • 65. <ADD YOUR TITLE HERE/> Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation