Troubles in Kubernetes Land:
Vault to the Rescue
Webinar devops.com
November 2019
© 2019 InfluxData. All rights reserved.2 @gitirabassi
Giacomo Tirabassi
InfluxData
SRE
󾓩 Italian
☸ Kubernetes in production since
v1.8
🍝 love to cook and eat
󾓦 lived 1 year in Memphis
󾓭 lived 1 year in Shanghai
© 2019 InfluxData. All rights reserved.3
© 2019 InfluxData. All rights reserved.4
What’s the problem?
© 2019 InfluxData. All rights reserved.5
© 2019 InfluxData. All rights reserved.6
Solution: use a managed
kubernetes solution
© 2019 InfluxData. All rights reserved.7
© 2019 InfluxData. All rights reserved.8
A little bit of context
• Multiple clusters
• Multiple regions
• Multiple cloud (also on-prem)
• Minimal operational effort
• Maximum automation
© 2019 InfluxData. All rights reserved.9
© 2019 InfluxData. All rights reserved.10
Kubernetes is just an application
a critical one
© 2019 InfluxData. All rights reserved.11
Automation: a little digression
Stateless
applications
(easy)
Stateful
applications
(hard)
Critical
applications
(he/she needs some milk)
© 2019 InfluxData. All rights reserved.12
Automation: why so hard?
• Credentials go from long lived to short lived
• How to observe?
• How to test?
• How to rollback? or rollforward?
• What can go wrong? and how to deal with it?
IMMUTABLE
INFRASTRUCTURE
© 2019 InfluxData. All rights reserved.13
When deploying Kubernetes yourself
• Manage Etcd
• Manage PKI
• Manage Nodes
• Upgrade Controlplane + Nodes
• Update configurations
• DR / BC
© 2019 InfluxData. All rights reserved.14
What can go wrong?
• 2 out of 5 of the HIGH risk are TLS related
• certificate expiration is an issue
• ETCD data dump is underestimate
• certificates and keys can be accessed on the master node if
PSP are not turned on
© 2019 InfluxData. All rights reserved.15
How can vault help?
© 2019 InfluxData. All rights reserved.16
Two parties needs access to Vault
• Human
LDAP
OIDC
• Nodes
aws
gcp
azure
alicloud
© 2019 InfluxData. All rights reserved.17
Humans need to SSH into nodes
• SSH Secret Engine: using Signed Keys can be as simple as
• vault login -method=oidc
• vault ssh -mode=ca ubuntu@192.168.0.10
© 2019 InfluxData. All rights reserved.18
Humans need access to K8S
• Identity secret engine is OIDC compatible
• added in v1.2.0
• very customizable id_token content
• automatic key rotation
© 2019 InfluxData. All rights reserved.19
3 types of nodes and policies
• Etcd nodes
• Control plane nodes
• Worker nodes
© 2019 InfluxData. All rights reserved.20
Etcd nodes
• Need access to ETCD server and peer certificates
• PKI Secret Engine is made for this
© 2019 InfluxData. All rights reserved.21
Control Plane nodes
• Need access to ETCD client certificates
• Need access to APIs CA
• Need access to front-proxy CA
• Service Account private/public keys (sign/verify JWT tokens)
• Needs access to Vault Transit backend to encrypt data in etcd
© 2019 InfluxData. All rights reserved.22
Worker nodes
• Need access to Kubeadm Join Token
Custom secret engine plugin
When a node needs access creates a short-lived token on kubernetes
Token validity can be configured, but 1 minute is default
• Needs access to kubelet server certificates
by default kubelet uses self-signed ones
© 2019 InfluxData. All rights reserved.23
Static secrets
• Migrations are hard
• For long lived secrets (eg. TLS certificates, api keys, etc)
• A compromise is needed: SOPS
© 2019 InfluxData. All rights reserved.24
• Build nodes images with packer (region and cloud agnostic)
• Deploy VMs with Terraform
module’s input are equal for all cloud providers
• Configure auth, service discovery, node role and cloud using
custom binary in userdata
Our Solution
© 2019 InfluxData. All rights reserved.25
• 1 kubernetes cluster:
1 PKI secret engines
1 transit engine
1 KV engine
1 kubeadm-token plugin
Our Solution: Vault’s side
© 2019 InfluxData. All rights reserved.26
• Certificates are expected to be on disk
• No automatic reload of new certificates
• Service Account token signing without external provider
• Built-in certificate signing still require private/public key on
disk
• How to authenticate on-prem nodes? appRole?
Kubernetes is not perfect yet
© 2019 InfluxData. All rights reserved.27
[[inputs.x509_cert]]
sources = [
"/etc/kubernetes/pki/ca.crt",
"/etc/kubernetes/pki/front-proxy-ca.crt",
"/etc/kubernetes/pki/front-proxy-client.crt",
"/etc/kubernetes/pki/etcd/ca.crt",
"/etc/kubernetes/pki/etcd/peer.crt",
"/etc/kubernetes/pki/etcd/healthcheck-client.crt",
"/etc/kubernetes/pki/etcd/server.crt",
"/etc/kubernetes/pki/apiserver.crt",
"/etc/kubernetes/pki/apiserver-kubelet-client.crt",
"/etc/kubernetes/pki/apiserver-etcd-client.crt",
]
Monitoring Certifcates with Telegraf
© 2019 InfluxData. All rights reserved.28
DEMO
© 2019 InfluxData. All rights reserved.29
RECAP
• Let’s go and automate everything: don’t be scared!
• Vault can be integrated in critical application deployments
• Having a single source for auditing all your infrastructural
credentials is amazing
We’re hiring!!
@gitirabassi

Troubles in Kubernetes Land: Vault to the Rescue

  • 1.
    Troubles in KubernetesLand: Vault to the Rescue Webinar devops.com November 2019
  • 2.
    © 2019 InfluxData.All rights reserved.2 @gitirabassi Giacomo Tirabassi InfluxData SRE 󾓩 Italian ☸ Kubernetes in production since v1.8 🍝 love to cook and eat 󾓦 lived 1 year in Memphis 󾓭 lived 1 year in Shanghai
  • 3.
    © 2019 InfluxData.All rights reserved.3
  • 4.
    © 2019 InfluxData.All rights reserved.4 What’s the problem?
  • 5.
    © 2019 InfluxData.All rights reserved.5
  • 6.
    © 2019 InfluxData.All rights reserved.6 Solution: use a managed kubernetes solution
  • 7.
    © 2019 InfluxData.All rights reserved.7
  • 8.
    © 2019 InfluxData.All rights reserved.8 A little bit of context • Multiple clusters • Multiple regions • Multiple cloud (also on-prem) • Minimal operational effort • Maximum automation
  • 9.
    © 2019 InfluxData.All rights reserved.9
  • 10.
    © 2019 InfluxData.All rights reserved.10 Kubernetes is just an application a critical one
  • 11.
    © 2019 InfluxData.All rights reserved.11 Automation: a little digression Stateless applications (easy) Stateful applications (hard) Critical applications (he/she needs some milk)
  • 12.
    © 2019 InfluxData.All rights reserved.12 Automation: why so hard? • Credentials go from long lived to short lived • How to observe? • How to test? • How to rollback? or rollforward? • What can go wrong? and how to deal with it? IMMUTABLE INFRASTRUCTURE
  • 13.
    © 2019 InfluxData.All rights reserved.13 When deploying Kubernetes yourself • Manage Etcd • Manage PKI • Manage Nodes • Upgrade Controlplane + Nodes • Update configurations • DR / BC
  • 14.
    © 2019 InfluxData.All rights reserved.14 What can go wrong? • 2 out of 5 of the HIGH risk are TLS related • certificate expiration is an issue • ETCD data dump is underestimate • certificates and keys can be accessed on the master node if PSP are not turned on
  • 15.
    © 2019 InfluxData.All rights reserved.15 How can vault help?
  • 16.
    © 2019 InfluxData.All rights reserved.16 Two parties needs access to Vault • Human LDAP OIDC • Nodes aws gcp azure alicloud
  • 17.
    © 2019 InfluxData.All rights reserved.17 Humans need to SSH into nodes • SSH Secret Engine: using Signed Keys can be as simple as • vault login -method=oidc • vault ssh -mode=ca ubuntu@192.168.0.10
  • 18.
    © 2019 InfluxData.All rights reserved.18 Humans need access to K8S • Identity secret engine is OIDC compatible • added in v1.2.0 • very customizable id_token content • automatic key rotation
  • 19.
    © 2019 InfluxData.All rights reserved.19 3 types of nodes and policies • Etcd nodes • Control plane nodes • Worker nodes
  • 20.
    © 2019 InfluxData.All rights reserved.20 Etcd nodes • Need access to ETCD server and peer certificates • PKI Secret Engine is made for this
  • 21.
    © 2019 InfluxData.All rights reserved.21 Control Plane nodes • Need access to ETCD client certificates • Need access to APIs CA • Need access to front-proxy CA • Service Account private/public keys (sign/verify JWT tokens) • Needs access to Vault Transit backend to encrypt data in etcd
  • 22.
    © 2019 InfluxData.All rights reserved.22 Worker nodes • Need access to Kubeadm Join Token Custom secret engine plugin When a node needs access creates a short-lived token on kubernetes Token validity can be configured, but 1 minute is default • Needs access to kubelet server certificates by default kubelet uses self-signed ones
  • 23.
    © 2019 InfluxData.All rights reserved.23 Static secrets • Migrations are hard • For long lived secrets (eg. TLS certificates, api keys, etc) • A compromise is needed: SOPS
  • 24.
    © 2019 InfluxData.All rights reserved.24 • Build nodes images with packer (region and cloud agnostic) • Deploy VMs with Terraform module’s input are equal for all cloud providers • Configure auth, service discovery, node role and cloud using custom binary in userdata Our Solution
  • 25.
    © 2019 InfluxData.All rights reserved.25 • 1 kubernetes cluster: 1 PKI secret engines 1 transit engine 1 KV engine 1 kubeadm-token plugin Our Solution: Vault’s side
  • 26.
    © 2019 InfluxData.All rights reserved.26 • Certificates are expected to be on disk • No automatic reload of new certificates • Service Account token signing without external provider • Built-in certificate signing still require private/public key on disk • How to authenticate on-prem nodes? appRole? Kubernetes is not perfect yet
  • 27.
    © 2019 InfluxData.All rights reserved.27 [[inputs.x509_cert]] sources = [ "/etc/kubernetes/pki/ca.crt", "/etc/kubernetes/pki/front-proxy-ca.crt", "/etc/kubernetes/pki/front-proxy-client.crt", "/etc/kubernetes/pki/etcd/ca.crt", "/etc/kubernetes/pki/etcd/peer.crt", "/etc/kubernetes/pki/etcd/healthcheck-client.crt", "/etc/kubernetes/pki/etcd/server.crt", "/etc/kubernetes/pki/apiserver.crt", "/etc/kubernetes/pki/apiserver-kubelet-client.crt", "/etc/kubernetes/pki/apiserver-etcd-client.crt", ] Monitoring Certifcates with Telegraf
  • 28.
    © 2019 InfluxData.All rights reserved.28 DEMO
  • 29.
    © 2019 InfluxData.All rights reserved.29 RECAP • Let’s go and automate everything: don’t be scared! • Vault can be integrated in critical application deployments • Having a single source for auditing all your infrastructural credentials is amazing
  • 30.