SlideShare a Scribd company logo
Lucy Davinhart – Sky Betting & Gaming
How we accelerated our Vault
adoption, with Terraform
👋 Who am I?
• Senior Automation Engineer
• @ Sky Betting & Gaming
• Part of The Stars Group
• Delivery Engineering Squad
• Part of the Infrastructure & Platforms Tribe
• Among other things we…
• Look after our Vault clusters
• Maintain Vault integrations & tooling
• Control access to AWS (via Vault!)
• Support internal customers
@LUCYDAVINHART @SBGTECHTEAM
🔐 What do we use Vault for?
• Across the company, our Vault users are:
• > 4000 Virtual Machines
• > 500 humans
• > 250 various AppRoles
• And a few more for Kubernetes Auth and AWS Auth
• Main features we use:
• K/V Secrets
• PKI
• AWS Credentials
@LUCYDAVINHART @SBGTECHTEAM
💬 This Talk
• Our problems managing Vault and onboarding people
• How we went about solving them
• Our initial Terraform solution
• How we have improved it over time
• The Future
@LUCYDAVINHART @SBGTECHTEAM
The Problems
✍️ Everything Manual
• Time consuming for us to make changes
• Making the changes
• Comparing policies, AppRoles, LDAP groups, etc.
• Time consuming to see what was in Vault already
• We were regularly asked to troubleshoot why User A doesn’t have access to Secret B
• Lack of standards / best practices
• (and we didn’t really know what we were doing initially)
• Automating Stuff is Cool 😎
@LUCYDAVINHART @SBGTECHTEAM
🧞♀️ We were too powerful
• We started out with full admin rights and access to everything
• Configure all the auth and secret mounts
• Read and write to all the secrets
• Give ourselves any policies we needed
• But at least none of us had root tokens, right? 😱
@LUCYDAVINHART @SBGTECHTEAM
🙈 Lack of Audit Trail
• What was changed?
• When was it changed?
• Who changed it?
• How did it change?
• Why did they change it?
@LUCYDAVINHART @SBGTECHTEAM
🧞
🧞
☹️
Initial Solutions
Vault Config Ruby Gem
• Downloads Vault config (policies, AppRoles, LDAP groups, etc) and saves in git repo
• Jenkins job to run this on a schedule
• We now have configuration backups, so we can see what has changed and when
• But not necessarily who or why
• Written very quickly:
• Was useful very quickly
• Was not particularly maintainable
@LUCYDAVINHART @SBGTECHTEAM
Goldfish Vault UI
• A Vault UI, before one was available in Open Source Vault
• Policy Request feature
• Users edited policies in the UI, and submitted for approval
• Vault admins review changes and apply
@LUCYDAVINHART @SBGTECHTEAM
Terraform Init
Terraform
• Codifies APIs into declarative configuration files
• Reproducible Infrastructure as Code
@LUCYDAVINHART @SBGTECHTEAM
Terraform Code
resource "vault_policy" "ravenclaw" { … }
resource "vault_policy" "hufflepuff" { … }
Terraform State
vault_policy.ravenclaw
vault_policy.slytherin
Terraform Plan
+ vault_policy.hufflepuff
- vault_policy.slytherin
🧞 Terraform Pipeline Design Decisions
• Look like the Vault API as much as possible
• Files which match the Vault API, e.g. sys/policy/foo.json
@LUCYDAVINHART @SBGTECHTEAM
🧞 Terraform Pipeline Design Decisions
policies.tf
resource "vault_policy" "example"
{
name = "dev-team"
policy = <<EOT
path "secret/my_app" {
capabilities = [”read”]
}
EOT
}
@LUCYDAVINHART @SBGTECHTEAM
sys/policy/example.hcl
path "secret/my_app" {
capabilities = [”read”]
}
🧞 Terraform Pipeline Design Decisions
• Look like the Vault API as much as possible
• Files which match the Vault API, e.g. sys/policy/foo.json
• (Initially) Take output from Ruby Gem as input
• Pull Requests to make changes
• Start with Policies, our most common request
• Everything in the repo in Vault
Nothing in Vault that was not in the repo
@LUCYDAVINHART @SBGTECHTEAM
Config
in
Vault
Config
in
Repo
Config
in Vault
+ Repo
Delete This
Create This
What Users See
@LUCYDAVINHART @SBGTECHTEAM
👩💻 What a User Sees
@LUCYDAVINHART @SBGTECHTEAM
👩💻 What a User Sees
@LUCYDAVINHART @SBGTECHTEAM
👩💻 What a User Sees
@LUCYDAVINHART @SBGTECHTEAM
👩💻 What a User Sees
@LUCYDAVINHART @SBGTECHTEAM
👩💻 What a User Sees
@LUCYDAVINHART @SBGTECHTEAM
👩💻 What a User Sees
@LUCYDAVINHART @SBGTECHTEAM
What’s Actually Happening?
@LUCYDAVINHART @SBGTECHTEAM
Jenkins Job
Makefile
@LUCYDAVINHART @SBGTECHTEAM
Init
• Ensures we have valid AWS credentials
• We store Terraform State in S3
• Dynamic AWS credentials from Vault
• terraform init
• Accesses remote Terraform State
• Downloads dependencies
• terraform workspace select test/prod
• Allows us to maintain separate Terraform State for different Vault clusters
@LUCYDAVINHART @SBGTECHTEAM
Import
• Lists resources in Vault
• Lists resources in Terraform State
• Imports resources not in Terraform State
@LUCYDAVINHART @SBGTECHTEAM
Config
in
Vault
Config
in
Repo
Generate
• Converts from files representing the Vault API into Terraform code
@LUCYDAVINHART @SBGTECHTEAM
resource "vault_policy" "example"
{
name = "dev-team"
policy = <<EOT
path "secret/my_app" {
capabilities = [”read”]
}
EOT
}
path "secret/my_app" {
capabilities = [”read”]
}
Validate
• terraform validate
• Ensures all generated Terraform code is syntactically correct
• Resource-specific checks
• Check for common human errors e.g.
• Types of certain resources (e.g. LDAP groups, AD users)
• Some case sensitivity issues
• Most of these are actually done in the Generate phase
@LUCYDAVINHART @SBGTECHTEAM
Plan
terraform plan -out=prod-vault.plan
@LUCYDAVINHART @SBGTECHTEAM
Terraform will perform the following actions:
+ vault_policy.hufflepuff
- vault_policy.slytherin
Plan: 1 to add, 0 to change, 1 to destroy.
AppRole
• So far: Read only access to Vault
• Prompt for a short-lived secret-id to gain write access to Vault
@LUCYDAVINHART @SBGTECHTEAM
Apply
terraform apply prod-vault.plan
@LUCYDAVINHART @SBGTECHTEAM
vault_policy.hufflepuff: Creating...
name: "" => "hufflepuff"
policy: "" => "..."
vault_policy.slytherin: Destroying... (ID: slytherin)
vault_policy.slytherin: Destruction complete after 0s
vault_policy.hufflepuff: Creation complete after 0s (ID: hufflepuff)
Apply complete! Resources: 1 added, 0 changed, 1 destroyed.
Commit + Merge
• Commit any generated Terraform code
• Merge release branch to master
@LUCYDAVINHART @SBGTECHTEAM
Incremental Improvements
LDAP Groups
• One of the most common requests, after policies
• Initially: vault_generic_secret
• Resource to manage arbitrary Vault paths
• Later: vault_ldap_auth_backend_group
• Dedicated LDAP group resource
• LDAP Restructure: Only allow certain LDAP groups to be mapped to policies
• ✅ PG-Vault-Foo
• 🚫 SG-MyTeam
@LUCYDAVINHART @SBGTECHTEAM
AppRoles
• Another of the most common requests, after policies
• We introduced Terraform variables for CIDR ranges:
@LUCYDAVINHART @SBGTECHTEAM
variable "cidr_range_prod_jenkins_agents" {
type = "list"
default = [
”1.2.3.4/30", # Production Site A Jenkins Agents
”2.3.4.5/30", # Production Site B Jenkins Agents
...
]
}
AppRoles
• Another of the most common requests, after policies
• We introduced Terraform variables for CIDR ranges:
@LUCYDAVINHART @SBGTECHTEAM
{
"token_bound_cidrs": "${var.cidr_range_prod_jenkins_agents}",
"policies": [
"default",
"terraform_vault-readonly”
],
"token_max_ttl": 120
}
Kubernetes Auth Roles
• The team managing the k8s clusters wrote this one for us!
• Effort needed by them:
• Write Import Script, based on existing scripts
• Write Generate Script, based on existing scripts
• Effort needed by us:
• Review their scripts
@LUCYDAVINHART @SBGTECHTEAM
AWS Auth Roles
• Some auto-generation of resources
• Get all AWS Account IDs with:
aws organizations list-accounts
• Generate resources:
@LUCYDAVINHART @SBGTECHTEAM
resource "vault_aws_auth_backend_sts_role" "role" {
backend = ”aws"
account_id = "1234567890"
sts_role = "arn:aws:iam::1234567890:role/my-role"
}
Active Directory Users
• ad/roles/:role_name
• has a few fields you can’t write to
@LUCYDAVINHART @SBGTECHTEAM
{
"last_vault_rotation": "2018-05-24T17:14:38.677370855Z",
"password_last_set": "2018-05-24T17:14:38.677370855Z",
"service_account_name": "my-application@example.com",
"ttl": 100
}
Active Directory Users
• vault_generic_endpoint resource
@LUCYDAVINHART @SBGTECHTEAM
resource "vault_generic_endpoint" "ad_role-vaulttest" {
path = "ad/roles/vaulttest”
data_json = ‘{"service_account_name": ”VaultTest@fancycorp.net"}’
# When reading, the secret contains keys that cannot be written:
# password_last_set (when did the password last get updated)
# last_vault_rotation (when did Vault last update the password)
ignore_absent_fields = true
}
Results
🎉 What Did All This Give Us?
• Time
• Individual changes take less of our time  We can handle more requests
• Visibility
• Easier to see what’s in Vault
• Easier to debug
• Auditability
• Who, What, When, How, Why
• grep-ability / Searchability
• Find common patterns
• Identify issues before they become problems
• Reducing our own permissions
• Lots of configuration can no longer be done by humans
@LUCYDAVINHART @SBGTECHTEAM
The Future
🆕 New Resources
• PKI
• dynamic X.509 certificates
• Sentinel Policies
• Richer access control functionality than ACL policies
• Namespaces
• Self-managed sub-Vaults
@LUCYDAVINHART @SBGTECHTEAM
🧞 Auto Generation
• AWS Accounts, all have standard permissions, which correspond to at least…
• 2x Vault Policies per account
• 2x LDAP Groups per account
• Auto-Generated PRs for common functionality
• Service Discovery for AppRole CIDR ranges
@LUCYDAVINHART @SBGTECHTEAM
🧞🧞 Review Security Trade-Offs
• 2FA to apply changes
• e.g. require 2 Factor Auth before a human can grant Jenkins read/write access
@LUCYDAVINHART @SBGTECHTEAM
Jenkins
requests
read/write
Human runs
command
2FA prompt
Human
pastes token
into Jenkins
Jenkins
requests
read/write
First human
runs
command
Second
human runs
command
First human
pastes token
into Jenkins
• Enterprise Control Groups
• e.g. require multiple humans to grant Jenkins read/write access
🧞🧞♀️ More validation before a PR can be merged
• Check resources for sensible parameters
• e.g. TTLs, num_uses, etc.
• Check if Vault has required permissions before approving PRs
• e.g. check if AWS account is in Organization
• e.g. check if AD user is in correct Organizational Unit
• Case sensitivity check on LDAP groups
• We have a script to manually check this
• Deploy to a local dev Vault
• For testing new features in the pipeline
@LUCYDAVINHART @SBGTECHTEAM
@LUCYDAVINHART @SBGTECHTEAM
• End-to-end, ignoring time waiting for humans, it currently takes 3.5m
• But it could be faster!
Gotta Go Fast!
👤 Make it Generic
• Allow pipeline to be run against child namespaces
• Config for each namespace stored in different repos
• Delegate permissions to other teams
@LUCYDAVINHART @SBGTECHTEAM
Conclusion
Thank You!
🧞
@LucyDavinhart - @SBGTechTeam
Slides: goto.lmhd.me/hc2019slides

More Related Content

What's hot

DevOps on AWS: Accelerating Software Delivery with the AWS Developer Tools
DevOps on AWS: Accelerating Software Delivery with the AWS Developer ToolsDevOps on AWS: Accelerating Software Delivery with the AWS Developer Tools
DevOps on AWS: Accelerating Software Delivery with the AWS Developer Tools
Amazon Web Services
 
Service discovery with Eureka and Spring Cloud
Service discovery with Eureka and Spring CloudService discovery with Eureka and Spring Cloud
Service discovery with Eureka and Spring Cloud
Marcelo Serpa
 
API and App Ecosystems - Build The Best: a deep dive
API and App Ecosystems - Build The Best: a deep diveAPI and App Ecosystems - Build The Best: a deep dive
API and App Ecosystems - Build The Best: a deep dive
Cisco DevNet
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
confluent
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
confluent
 
OpenStack: Toward a More Resilient Cloud
OpenStack: Toward a More Resilient CloudOpenStack: Toward a More Resilient Cloud
OpenStack: Toward a More Resilient Cloud
Mark Voelker
 
Scale your application to new heights with NGINX and AWS
Scale your application to new heights with NGINX and AWSScale your application to new heights with NGINX and AWS
Scale your application to new heights with NGINX and AWS
NGINX, Inc.
 
Scala Security: Eliminate 200+ Code-Level Threats With Fortify SCA For Scala
Scala Security: Eliminate 200+ Code-Level Threats With Fortify SCA For ScalaScala Security: Eliminate 200+ Code-Level Threats With Fortify SCA For Scala
Scala Security: Eliminate 200+ Code-Level Threats With Fortify SCA For Scala
Lightbend
 
How to win skeptics to aggregated logging using Vagrant and ELK
How to win skeptics to aggregated logging using Vagrant and ELKHow to win skeptics to aggregated logging using Vagrant and ELK
How to win skeptics to aggregated logging using Vagrant and ELK
Skelton Thatcher Consulting Ltd
 
Apache Kafka in Adobe Ad Cloud's Analytics Platform
Apache Kafka in Adobe Ad Cloud's Analytics PlatformApache Kafka in Adobe Ad Cloud's Analytics Platform
Apache Kafka in Adobe Ad Cloud's Analytics Platform
confluent
 
Atlanta Microservices Day: Istio Service Mesh
Atlanta Microservices Day: Istio Service MeshAtlanta Microservices Day: Istio Service Mesh
Atlanta Microservices Day: Istio Service Mesh
Christian Posta
 
Shared Security Responsibility Model of AWS
Shared Security Responsibility Model of AWSShared Security Responsibility Model of AWS
Shared Security Responsibility Model of AWS
Akshay Mathur
 
OSMC 2021 | Use OpenSource monitoring for an Enterprise Grade Platform
OSMC 2021 | Use OpenSource monitoring for an Enterprise Grade PlatformOSMC 2021 | Use OpenSource monitoring for an Enterprise Grade Platform
OSMC 2021 | Use OpenSource monitoring for an Enterprise Grade Platform
NETWAYS
 
What's new in log insight 3.3 presentation
What's new in log insight 3.3 presentationWhat's new in log insight 3.3 presentation
What's new in log insight 3.3 presentation
David Pasek
 
Cloud Native Camel Riding
Cloud Native Camel RidingCloud Native Camel Riding
Cloud Native Camel Riding
Christian Posta
 
Testing at Stream-Scale
Testing at Stream-ScaleTesting at Stream-Scale
Testing at Stream-Scale
All Things Open
 
Deploying Kong with Mesosphere DC/OS
Deploying Kong with Mesosphere DC/OSDeploying Kong with Mesosphere DC/OS
Deploying Kong with Mesosphere DC/OS
Mesosphere Inc.
 
Spring Cloud and Netflix OSS overview v1
Spring Cloud and Netflix OSS overview v1Spring Cloud and Netflix OSS overview v1
Spring Cloud and Netflix OSS overview v1
Dmitry Skaredov
 
APIs: Intelligent Routing, Security, & Management
APIs: Intelligent Routing, Security, & ManagementAPIs: Intelligent Routing, Security, & Management
APIs: Intelligent Routing, Security, & Management
NGINX, Inc.
 
Vulnerability Discovery in the Cloud
Vulnerability Discovery in the CloudVulnerability Discovery in the Cloud
Vulnerability Discovery in the Cloud
DevOps.com
 

What's hot (20)

DevOps on AWS: Accelerating Software Delivery with the AWS Developer Tools
DevOps on AWS: Accelerating Software Delivery with the AWS Developer ToolsDevOps on AWS: Accelerating Software Delivery with the AWS Developer Tools
DevOps on AWS: Accelerating Software Delivery with the AWS Developer Tools
 
Service discovery with Eureka and Spring Cloud
Service discovery with Eureka and Spring CloudService discovery with Eureka and Spring Cloud
Service discovery with Eureka and Spring Cloud
 
API and App Ecosystems - Build The Best: a deep dive
API and App Ecosystems - Build The Best: a deep diveAPI and App Ecosystems - Build The Best: a deep dive
API and App Ecosystems - Build The Best: a deep dive
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
 
OpenStack: Toward a More Resilient Cloud
OpenStack: Toward a More Resilient CloudOpenStack: Toward a More Resilient Cloud
OpenStack: Toward a More Resilient Cloud
 
Scale your application to new heights with NGINX and AWS
Scale your application to new heights with NGINX and AWSScale your application to new heights with NGINX and AWS
Scale your application to new heights with NGINX and AWS
 
Scala Security: Eliminate 200+ Code-Level Threats With Fortify SCA For Scala
Scala Security: Eliminate 200+ Code-Level Threats With Fortify SCA For ScalaScala Security: Eliminate 200+ Code-Level Threats With Fortify SCA For Scala
Scala Security: Eliminate 200+ Code-Level Threats With Fortify SCA For Scala
 
How to win skeptics to aggregated logging using Vagrant and ELK
How to win skeptics to aggregated logging using Vagrant and ELKHow to win skeptics to aggregated logging using Vagrant and ELK
How to win skeptics to aggregated logging using Vagrant and ELK
 
Apache Kafka in Adobe Ad Cloud's Analytics Platform
Apache Kafka in Adobe Ad Cloud's Analytics PlatformApache Kafka in Adobe Ad Cloud's Analytics Platform
Apache Kafka in Adobe Ad Cloud's Analytics Platform
 
Atlanta Microservices Day: Istio Service Mesh
Atlanta Microservices Day: Istio Service MeshAtlanta Microservices Day: Istio Service Mesh
Atlanta Microservices Day: Istio Service Mesh
 
Shared Security Responsibility Model of AWS
Shared Security Responsibility Model of AWSShared Security Responsibility Model of AWS
Shared Security Responsibility Model of AWS
 
OSMC 2021 | Use OpenSource monitoring for an Enterprise Grade Platform
OSMC 2021 | Use OpenSource monitoring for an Enterprise Grade PlatformOSMC 2021 | Use OpenSource monitoring for an Enterprise Grade Platform
OSMC 2021 | Use OpenSource monitoring for an Enterprise Grade Platform
 
What's new in log insight 3.3 presentation
What's new in log insight 3.3 presentationWhat's new in log insight 3.3 presentation
What's new in log insight 3.3 presentation
 
Cloud Native Camel Riding
Cloud Native Camel RidingCloud Native Camel Riding
Cloud Native Camel Riding
 
Testing at Stream-Scale
Testing at Stream-ScaleTesting at Stream-Scale
Testing at Stream-Scale
 
Deploying Kong with Mesosphere DC/OS
Deploying Kong with Mesosphere DC/OSDeploying Kong with Mesosphere DC/OS
Deploying Kong with Mesosphere DC/OS
 
Spring Cloud and Netflix OSS overview v1
Spring Cloud and Netflix OSS overview v1Spring Cloud and Netflix OSS overview v1
Spring Cloud and Netflix OSS overview v1
 
APIs: Intelligent Routing, Security, & Management
APIs: Intelligent Routing, Security, & ManagementAPIs: Intelligent Routing, Security, & Management
APIs: Intelligent Routing, Security, & Management
 
Vulnerability Discovery in the Cloud
Vulnerability Discovery in the CloudVulnerability Discovery in the Cloud
Vulnerability Discovery in the Cloud
 

Similar to How we accelerated our vault adoption with terraform

Nodejsvault austin2019
Nodejsvault austin2019Nodejsvault austin2019
Nodejsvault austin2019
Taswar Bhatti
 
stackconf 2022: Infrastructure Automation (anti) patterns
stackconf 2022: Infrastructure Automation (anti) patternsstackconf 2022: Infrastructure Automation (anti) patterns
stackconf 2022: Infrastructure Automation (anti) patterns
NETWAYS
 
Infrastructure as Code Patterns
Infrastructure as Code PatternsInfrastructure as Code Patterns
Infrastructure as Code Patterns
Kris Buytaert
 
Managing your secrets in a cloud environment
Managing your secrets in a cloud environmentManaging your secrets in a cloud environment
Managing your secrets in a cloud environment
Taswar Bhatti
 
Vault
VaultVault
Vault
dawnlua
 
Into The Box 2020 Keynote Day 1
Into The Box 2020 Keynote Day 1Into The Box 2020 Keynote Day 1
Into The Box 2020 Keynote Day 1
Ortus Solutions, Corp
 
A microservices journey - Round 2
A microservices journey - Round 2A microservices journey - Round 2
A microservices journey - Round 2
Christian Posta
 
Greenfields tech decisions
Greenfields tech decisionsGreenfields tech decisions
Greenfields tech decisions
Trent Hornibrook
 
Security for devs
Security for devsSecurity for devs
Security for devs
Abdelrhman Shawky
 
Middleware in Golang: InVision's Rye
Middleware in Golang: InVision's RyeMiddleware in Golang: InVision's Rye
Middleware in Golang: InVision's Rye
Cale Hoopes
 
GitHub Actions Security - DDOG
GitHub Actions Security - DDOGGitHub Actions Security - DDOG
GitHub Actions Security - DDOG
RobBos10
 
Streams API (Web Engines Hackfest 2015)
Streams API (Web Engines Hackfest 2015)Streams API (Web Engines Hackfest 2015)
Streams API (Web Engines Hackfest 2015)
Igalia
 
jDays Sweden 2016
jDays Sweden 2016jDays Sweden 2016
jDays Sweden 2016
Alex Theedom
 
Secure your app with keycloak
Secure your app with keycloakSecure your app with keycloak
Secure your app with keycloak
Guy Marom
 
Pentest Apocalypse
Pentest ApocalypsePentest Apocalypse
Pentest Apocalypse
Beau Bullock
 
DevNexus 2015: Kubernetes & Container Engine
DevNexus 2015: Kubernetes & Container EngineDevNexus 2015: Kubernetes & Container Engine
DevNexus 2015: Kubernetes & Container Engine
Kit Merker
 
DevSecOps in a cloudnative world
DevSecOps in a cloudnative worldDevSecOps in a cloudnative world
DevSecOps in a cloudnative world
Karthik Gaekwad
 
Inside Of Mbga Open Platform
Inside Of Mbga Open PlatformInside Of Mbga Open Platform
Inside Of Mbga Open PlatformHideo Kimura
 
Meet Solr For The Tirst Again
Meet Solr For The Tirst AgainMeet Solr For The Tirst Again
Meet Solr For The Tirst Again
Varun Thacker
 
Secure all things with CBSecurity 3
Secure all things with CBSecurity 3Secure all things with CBSecurity 3
Secure all things with CBSecurity 3
Ortus Solutions, Corp
 

Similar to How we accelerated our vault adoption with terraform (20)

Nodejsvault austin2019
Nodejsvault austin2019Nodejsvault austin2019
Nodejsvault austin2019
 
stackconf 2022: Infrastructure Automation (anti) patterns
stackconf 2022: Infrastructure Automation (anti) patternsstackconf 2022: Infrastructure Automation (anti) patterns
stackconf 2022: Infrastructure Automation (anti) patterns
 
Infrastructure as Code Patterns
Infrastructure as Code PatternsInfrastructure as Code Patterns
Infrastructure as Code Patterns
 
Managing your secrets in a cloud environment
Managing your secrets in a cloud environmentManaging your secrets in a cloud environment
Managing your secrets in a cloud environment
 
Vault
VaultVault
Vault
 
Into The Box 2020 Keynote Day 1
Into The Box 2020 Keynote Day 1Into The Box 2020 Keynote Day 1
Into The Box 2020 Keynote Day 1
 
A microservices journey - Round 2
A microservices journey - Round 2A microservices journey - Round 2
A microservices journey - Round 2
 
Greenfields tech decisions
Greenfields tech decisionsGreenfields tech decisions
Greenfields tech decisions
 
Security for devs
Security for devsSecurity for devs
Security for devs
 
Middleware in Golang: InVision's Rye
Middleware in Golang: InVision's RyeMiddleware in Golang: InVision's Rye
Middleware in Golang: InVision's Rye
 
GitHub Actions Security - DDOG
GitHub Actions Security - DDOGGitHub Actions Security - DDOG
GitHub Actions Security - DDOG
 
Streams API (Web Engines Hackfest 2015)
Streams API (Web Engines Hackfest 2015)Streams API (Web Engines Hackfest 2015)
Streams API (Web Engines Hackfest 2015)
 
jDays Sweden 2016
jDays Sweden 2016jDays Sweden 2016
jDays Sweden 2016
 
Secure your app with keycloak
Secure your app with keycloakSecure your app with keycloak
Secure your app with keycloak
 
Pentest Apocalypse
Pentest ApocalypsePentest Apocalypse
Pentest Apocalypse
 
DevNexus 2015: Kubernetes & Container Engine
DevNexus 2015: Kubernetes & Container EngineDevNexus 2015: Kubernetes & Container Engine
DevNexus 2015: Kubernetes & Container Engine
 
DevSecOps in a cloudnative world
DevSecOps in a cloudnative worldDevSecOps in a cloudnative world
DevSecOps in a cloudnative world
 
Inside Of Mbga Open Platform
Inside Of Mbga Open PlatformInside Of Mbga Open Platform
Inside Of Mbga Open Platform
 
Meet Solr For The Tirst Again
Meet Solr For The Tirst AgainMeet Solr For The Tirst Again
Meet Solr For The Tirst Again
 
Secure all things with CBSecurity 3
Secure all things with CBSecurity 3Secure all things with CBSecurity 3
Secure all things with CBSecurity 3
 

More from Mitchell Pronschinske

Getting Started with Kubernetes and Consul
Getting Started with Kubernetes and ConsulGetting Started with Kubernetes and Consul
Getting Started with Kubernetes and Consul
Mitchell Pronschinske
 
Multi-Cloud with Nomad and Consul Connect
Multi-Cloud with Nomad and Consul ConnectMulti-Cloud with Nomad and Consul Connect
Multi-Cloud with Nomad and Consul Connect
Mitchell Pronschinske
 
Code quality for Terraform
Code quality for TerraformCode quality for Terraform
Code quality for Terraform
Mitchell Pronschinske
 
Dynamic Azure Credentials for Applications and CI/CD Pipelines
Dynamic Azure Credentials for Applications and CI/CD PipelinesDynamic Azure Credentials for Applications and CI/CD Pipelines
Dynamic Azure Credentials for Applications and CI/CD Pipelines
Mitchell Pronschinske
 
Migrating from VMs to Kubernetes using HashiCorp Consul Service on Azure
Migrating from VMs to Kubernetes using HashiCorp Consul Service on AzureMigrating from VMs to Kubernetes using HashiCorp Consul Service on Azure
Migrating from VMs to Kubernetes using HashiCorp Consul Service on Azure
Mitchell Pronschinske
 
Empowering developers and operators through Gitlab and HashiCorp
Empowering developers and operators through Gitlab and HashiCorpEmpowering developers and operators through Gitlab and HashiCorp
Empowering developers and operators through Gitlab and HashiCorp
Mitchell Pronschinske
 
Automate and simplify multi cloud complexity with f5 and hashi corp
Automate and simplify multi cloud complexity with f5 and hashi corpAutomate and simplify multi cloud complexity with f5 and hashi corp
Automate and simplify multi cloud complexity with f5 and hashi corp
Mitchell Pronschinske
 
Vault 1.5 Overview
Vault 1.5 OverviewVault 1.5 Overview
Vault 1.5 Overview
Mitchell Pronschinske
 
Using new sentinel features in terraform cloud
Using new sentinel features in terraform cloudUsing new sentinel features in terraform cloud
Using new sentinel features in terraform cloud
Mitchell Pronschinske
 
Military Edge Computing with Vault and Consul
Military Edge Computing with Vault and ConsulMilitary Edge Computing with Vault and Consul
Military Edge Computing with Vault and Consul
Mitchell Pronschinske
 
Unlocking the Cloud operating model with GitHub Actions
Unlocking the Cloud operating model with GitHub ActionsUnlocking the Cloud operating model with GitHub Actions
Unlocking the Cloud operating model with GitHub Actions
Mitchell Pronschinske
 
Vault 1.4 integrated storage overview
Vault 1.4 integrated storage overviewVault 1.4 integrated storage overview
Vault 1.4 integrated storage overview
Mitchell Pronschinske
 
Unlocking the Cloud Operating Model
Unlocking the Cloud Operating ModelUnlocking the Cloud Operating Model
Unlocking the Cloud Operating Model
Mitchell Pronschinske
 
Cisco ACI with HashiCorp Terraform (APAC)
Cisco ACI with HashiCorp Terraform (APAC)Cisco ACI with HashiCorp Terraform (APAC)
Cisco ACI with HashiCorp Terraform (APAC)
Mitchell Pronschinske
 
Governance for Multiple Teams Sharing a Nomad Cluster
Governance for Multiple Teams Sharing a Nomad ClusterGovernance for Multiple Teams Sharing a Nomad Cluster
Governance for Multiple Teams Sharing a Nomad Cluster
Mitchell Pronschinske
 
Integrating Terraform and Consul
Integrating Terraform and ConsulIntegrating Terraform and Consul
Integrating Terraform and Consul
Mitchell Pronschinske
 
Unlocking the Cloud Operating Model: Deployment
Unlocking the Cloud Operating Model: DeploymentUnlocking the Cloud Operating Model: Deployment
Unlocking the Cloud Operating Model: Deployment
Mitchell Pronschinske
 
Keeping a Secret with HashiCorp Vault
Keeping a Secret with HashiCorp VaultKeeping a Secret with HashiCorp Vault
Keeping a Secret with HashiCorp Vault
Mitchell Pronschinske
 
Modern Scheduling for Modern Applications with Nomad
Modern Scheduling for Modern Applications with NomadModern Scheduling for Modern Applications with Nomad
Modern Scheduling for Modern Applications with Nomad
Mitchell Pronschinske
 
Moving to a Microservice World: Leveraging Consul on Azure
Moving to a Microservice World: Leveraging Consul on AzureMoving to a Microservice World: Leveraging Consul on Azure
Moving to a Microservice World: Leveraging Consul on Azure
Mitchell Pronschinske
 

More from Mitchell Pronschinske (20)

Getting Started with Kubernetes and Consul
Getting Started with Kubernetes and ConsulGetting Started with Kubernetes and Consul
Getting Started with Kubernetes and Consul
 
Multi-Cloud with Nomad and Consul Connect
Multi-Cloud with Nomad and Consul ConnectMulti-Cloud with Nomad and Consul Connect
Multi-Cloud with Nomad and Consul Connect
 
Code quality for Terraform
Code quality for TerraformCode quality for Terraform
Code quality for Terraform
 
Dynamic Azure Credentials for Applications and CI/CD Pipelines
Dynamic Azure Credentials for Applications and CI/CD PipelinesDynamic Azure Credentials for Applications and CI/CD Pipelines
Dynamic Azure Credentials for Applications and CI/CD Pipelines
 
Migrating from VMs to Kubernetes using HashiCorp Consul Service on Azure
Migrating from VMs to Kubernetes using HashiCorp Consul Service on AzureMigrating from VMs to Kubernetes using HashiCorp Consul Service on Azure
Migrating from VMs to Kubernetes using HashiCorp Consul Service on Azure
 
Empowering developers and operators through Gitlab and HashiCorp
Empowering developers and operators through Gitlab and HashiCorpEmpowering developers and operators through Gitlab and HashiCorp
Empowering developers and operators through Gitlab and HashiCorp
 
Automate and simplify multi cloud complexity with f5 and hashi corp
Automate and simplify multi cloud complexity with f5 and hashi corpAutomate and simplify multi cloud complexity with f5 and hashi corp
Automate and simplify multi cloud complexity with f5 and hashi corp
 
Vault 1.5 Overview
Vault 1.5 OverviewVault 1.5 Overview
Vault 1.5 Overview
 
Using new sentinel features in terraform cloud
Using new sentinel features in terraform cloudUsing new sentinel features in terraform cloud
Using new sentinel features in terraform cloud
 
Military Edge Computing with Vault and Consul
Military Edge Computing with Vault and ConsulMilitary Edge Computing with Vault and Consul
Military Edge Computing with Vault and Consul
 
Unlocking the Cloud operating model with GitHub Actions
Unlocking the Cloud operating model with GitHub ActionsUnlocking the Cloud operating model with GitHub Actions
Unlocking the Cloud operating model with GitHub Actions
 
Vault 1.4 integrated storage overview
Vault 1.4 integrated storage overviewVault 1.4 integrated storage overview
Vault 1.4 integrated storage overview
 
Unlocking the Cloud Operating Model
Unlocking the Cloud Operating ModelUnlocking the Cloud Operating Model
Unlocking the Cloud Operating Model
 
Cisco ACI with HashiCorp Terraform (APAC)
Cisco ACI with HashiCorp Terraform (APAC)Cisco ACI with HashiCorp Terraform (APAC)
Cisco ACI with HashiCorp Terraform (APAC)
 
Governance for Multiple Teams Sharing a Nomad Cluster
Governance for Multiple Teams Sharing a Nomad ClusterGovernance for Multiple Teams Sharing a Nomad Cluster
Governance for Multiple Teams Sharing a Nomad Cluster
 
Integrating Terraform and Consul
Integrating Terraform and ConsulIntegrating Terraform and Consul
Integrating Terraform and Consul
 
Unlocking the Cloud Operating Model: Deployment
Unlocking the Cloud Operating Model: DeploymentUnlocking the Cloud Operating Model: Deployment
Unlocking the Cloud Operating Model: Deployment
 
Keeping a Secret with HashiCorp Vault
Keeping a Secret with HashiCorp VaultKeeping a Secret with HashiCorp Vault
Keeping a Secret with HashiCorp Vault
 
Modern Scheduling for Modern Applications with Nomad
Modern Scheduling for Modern Applications with NomadModern Scheduling for Modern Applications with Nomad
Modern Scheduling for Modern Applications with Nomad
 
Moving to a Microservice World: Leveraging Consul on Azure
Moving to a Microservice World: Leveraging Consul on AzureMoving to a Microservice World: Leveraging Consul on Azure
Moving to a Microservice World: Leveraging Consul on Azure
 

Recently uploaded

Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 

Recently uploaded (20)

Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 

How we accelerated our vault adoption with terraform

  • 1. Lucy Davinhart – Sky Betting & Gaming How we accelerated our Vault adoption, with Terraform
  • 2. 👋 Who am I? • Senior Automation Engineer • @ Sky Betting & Gaming • Part of The Stars Group • Delivery Engineering Squad • Part of the Infrastructure & Platforms Tribe • Among other things we… • Look after our Vault clusters • Maintain Vault integrations & tooling • Control access to AWS (via Vault!) • Support internal customers @LUCYDAVINHART @SBGTECHTEAM
  • 3. 🔐 What do we use Vault for? • Across the company, our Vault users are: • > 4000 Virtual Machines • > 500 humans • > 250 various AppRoles • And a few more for Kubernetes Auth and AWS Auth • Main features we use: • K/V Secrets • PKI • AWS Credentials @LUCYDAVINHART @SBGTECHTEAM
  • 4. 💬 This Talk • Our problems managing Vault and onboarding people • How we went about solving them • Our initial Terraform solution • How we have improved it over time • The Future @LUCYDAVINHART @SBGTECHTEAM
  • 6. ✍️ Everything Manual • Time consuming for us to make changes • Making the changes • Comparing policies, AppRoles, LDAP groups, etc. • Time consuming to see what was in Vault already • We were regularly asked to troubleshoot why User A doesn’t have access to Secret B • Lack of standards / best practices • (and we didn’t really know what we were doing initially) • Automating Stuff is Cool 😎 @LUCYDAVINHART @SBGTECHTEAM
  • 7. 🧞♀️ We were too powerful • We started out with full admin rights and access to everything • Configure all the auth and secret mounts • Read and write to all the secrets • Give ourselves any policies we needed • But at least none of us had root tokens, right? 😱 @LUCYDAVINHART @SBGTECHTEAM
  • 8. 🙈 Lack of Audit Trail • What was changed? • When was it changed? • Who changed it? • How did it change? • Why did they change it? @LUCYDAVINHART @SBGTECHTEAM 🧞 🧞 ☹️
  • 10. Vault Config Ruby Gem • Downloads Vault config (policies, AppRoles, LDAP groups, etc) and saves in git repo • Jenkins job to run this on a schedule • We now have configuration backups, so we can see what has changed and when • But not necessarily who or why • Written very quickly: • Was useful very quickly • Was not particularly maintainable @LUCYDAVINHART @SBGTECHTEAM
  • 11. Goldfish Vault UI • A Vault UI, before one was available in Open Source Vault • Policy Request feature • Users edited policies in the UI, and submitted for approval • Vault admins review changes and apply @LUCYDAVINHART @SBGTECHTEAM
  • 13. Terraform • Codifies APIs into declarative configuration files • Reproducible Infrastructure as Code @LUCYDAVINHART @SBGTECHTEAM Terraform Code resource "vault_policy" "ravenclaw" { … } resource "vault_policy" "hufflepuff" { … } Terraform State vault_policy.ravenclaw vault_policy.slytherin Terraform Plan + vault_policy.hufflepuff - vault_policy.slytherin
  • 14. 🧞 Terraform Pipeline Design Decisions • Look like the Vault API as much as possible • Files which match the Vault API, e.g. sys/policy/foo.json @LUCYDAVINHART @SBGTECHTEAM
  • 15. 🧞 Terraform Pipeline Design Decisions policies.tf resource "vault_policy" "example" { name = "dev-team" policy = <<EOT path "secret/my_app" { capabilities = [”read”] } EOT } @LUCYDAVINHART @SBGTECHTEAM sys/policy/example.hcl path "secret/my_app" { capabilities = [”read”] }
  • 16. 🧞 Terraform Pipeline Design Decisions • Look like the Vault API as much as possible • Files which match the Vault API, e.g. sys/policy/foo.json • (Initially) Take output from Ruby Gem as input • Pull Requests to make changes • Start with Policies, our most common request • Everything in the repo in Vault Nothing in Vault that was not in the repo @LUCYDAVINHART @SBGTECHTEAM Config in Vault Config in Repo Config in Vault + Repo Delete This Create This
  • 19. 👩💻 What a User Sees @LUCYDAVINHART @SBGTECHTEAM
  • 20. 👩💻 What a User Sees @LUCYDAVINHART @SBGTECHTEAM
  • 21. 👩💻 What a User Sees @LUCYDAVINHART @SBGTECHTEAM
  • 22. 👩💻 What a User Sees @LUCYDAVINHART @SBGTECHTEAM
  • 23. 👩💻 What a User Sees @LUCYDAVINHART @SBGTECHTEAM
  • 24. 👩💻 What a User Sees @LUCYDAVINHART @SBGTECHTEAM
  • 28. Init • Ensures we have valid AWS credentials • We store Terraform State in S3 • Dynamic AWS credentials from Vault • terraform init • Accesses remote Terraform State • Downloads dependencies • terraform workspace select test/prod • Allows us to maintain separate Terraform State for different Vault clusters @LUCYDAVINHART @SBGTECHTEAM
  • 29. Import • Lists resources in Vault • Lists resources in Terraform State • Imports resources not in Terraform State @LUCYDAVINHART @SBGTECHTEAM Config in Vault Config in Repo
  • 30. Generate • Converts from files representing the Vault API into Terraform code @LUCYDAVINHART @SBGTECHTEAM resource "vault_policy" "example" { name = "dev-team" policy = <<EOT path "secret/my_app" { capabilities = [”read”] } EOT } path "secret/my_app" { capabilities = [”read”] }
  • 31. Validate • terraform validate • Ensures all generated Terraform code is syntactically correct • Resource-specific checks • Check for common human errors e.g. • Types of certain resources (e.g. LDAP groups, AD users) • Some case sensitivity issues • Most of these are actually done in the Generate phase @LUCYDAVINHART @SBGTECHTEAM
  • 32. Plan terraform plan -out=prod-vault.plan @LUCYDAVINHART @SBGTECHTEAM Terraform will perform the following actions: + vault_policy.hufflepuff - vault_policy.slytherin Plan: 1 to add, 0 to change, 1 to destroy.
  • 33. AppRole • So far: Read only access to Vault • Prompt for a short-lived secret-id to gain write access to Vault @LUCYDAVINHART @SBGTECHTEAM
  • 34. Apply terraform apply prod-vault.plan @LUCYDAVINHART @SBGTECHTEAM vault_policy.hufflepuff: Creating... name: "" => "hufflepuff" policy: "" => "..." vault_policy.slytherin: Destroying... (ID: slytherin) vault_policy.slytherin: Destruction complete after 0s vault_policy.hufflepuff: Creation complete after 0s (ID: hufflepuff) Apply complete! Resources: 1 added, 0 changed, 1 destroyed.
  • 35. Commit + Merge • Commit any generated Terraform code • Merge release branch to master @LUCYDAVINHART @SBGTECHTEAM
  • 37. LDAP Groups • One of the most common requests, after policies • Initially: vault_generic_secret • Resource to manage arbitrary Vault paths • Later: vault_ldap_auth_backend_group • Dedicated LDAP group resource • LDAP Restructure: Only allow certain LDAP groups to be mapped to policies • ✅ PG-Vault-Foo • 🚫 SG-MyTeam @LUCYDAVINHART @SBGTECHTEAM
  • 38. AppRoles • Another of the most common requests, after policies • We introduced Terraform variables for CIDR ranges: @LUCYDAVINHART @SBGTECHTEAM variable "cidr_range_prod_jenkins_agents" { type = "list" default = [ ”1.2.3.4/30", # Production Site A Jenkins Agents ”2.3.4.5/30", # Production Site B Jenkins Agents ... ] }
  • 39. AppRoles • Another of the most common requests, after policies • We introduced Terraform variables for CIDR ranges: @LUCYDAVINHART @SBGTECHTEAM { "token_bound_cidrs": "${var.cidr_range_prod_jenkins_agents}", "policies": [ "default", "terraform_vault-readonly” ], "token_max_ttl": 120 }
  • 40. Kubernetes Auth Roles • The team managing the k8s clusters wrote this one for us! • Effort needed by them: • Write Import Script, based on existing scripts • Write Generate Script, based on existing scripts • Effort needed by us: • Review their scripts @LUCYDAVINHART @SBGTECHTEAM
  • 41. AWS Auth Roles • Some auto-generation of resources • Get all AWS Account IDs with: aws organizations list-accounts • Generate resources: @LUCYDAVINHART @SBGTECHTEAM resource "vault_aws_auth_backend_sts_role" "role" { backend = ”aws" account_id = "1234567890" sts_role = "arn:aws:iam::1234567890:role/my-role" }
  • 42. Active Directory Users • ad/roles/:role_name • has a few fields you can’t write to @LUCYDAVINHART @SBGTECHTEAM { "last_vault_rotation": "2018-05-24T17:14:38.677370855Z", "password_last_set": "2018-05-24T17:14:38.677370855Z", "service_account_name": "my-application@example.com", "ttl": 100 }
  • 43. Active Directory Users • vault_generic_endpoint resource @LUCYDAVINHART @SBGTECHTEAM resource "vault_generic_endpoint" "ad_role-vaulttest" { path = "ad/roles/vaulttest” data_json = ‘{"service_account_name": ”VaultTest@fancycorp.net"}’ # When reading, the secret contains keys that cannot be written: # password_last_set (when did the password last get updated) # last_vault_rotation (when did Vault last update the password) ignore_absent_fields = true }
  • 45. 🎉 What Did All This Give Us? • Time • Individual changes take less of our time  We can handle more requests • Visibility • Easier to see what’s in Vault • Easier to debug • Auditability • Who, What, When, How, Why • grep-ability / Searchability • Find common patterns • Identify issues before they become problems • Reducing our own permissions • Lots of configuration can no longer be done by humans @LUCYDAVINHART @SBGTECHTEAM
  • 47. 🆕 New Resources • PKI • dynamic X.509 certificates • Sentinel Policies • Richer access control functionality than ACL policies • Namespaces • Self-managed sub-Vaults @LUCYDAVINHART @SBGTECHTEAM
  • 48. 🧞 Auto Generation • AWS Accounts, all have standard permissions, which correspond to at least… • 2x Vault Policies per account • 2x LDAP Groups per account • Auto-Generated PRs for common functionality • Service Discovery for AppRole CIDR ranges @LUCYDAVINHART @SBGTECHTEAM
  • 49. 🧞🧞 Review Security Trade-Offs • 2FA to apply changes • e.g. require 2 Factor Auth before a human can grant Jenkins read/write access @LUCYDAVINHART @SBGTECHTEAM Jenkins requests read/write Human runs command 2FA prompt Human pastes token into Jenkins Jenkins requests read/write First human runs command Second human runs command First human pastes token into Jenkins • Enterprise Control Groups • e.g. require multiple humans to grant Jenkins read/write access
  • 50. 🧞🧞♀️ More validation before a PR can be merged • Check resources for sensible parameters • e.g. TTLs, num_uses, etc. • Check if Vault has required permissions before approving PRs • e.g. check if AWS account is in Organization • e.g. check if AD user is in correct Organizational Unit • Case sensitivity check on LDAP groups • We have a script to manually check this • Deploy to a local dev Vault • For testing new features in the pipeline @LUCYDAVINHART @SBGTECHTEAM
  • 51. @LUCYDAVINHART @SBGTECHTEAM • End-to-end, ignoring time waiting for humans, it currently takes 3.5m • But it could be faster! Gotta Go Fast!
  • 52. 👤 Make it Generic • Allow pipeline to be run against child namespaces • Config for each namespace stored in different repos • Delegate permissions to other teams @LUCYDAVINHART @SBGTECHTEAM
  • 54. Thank You! 🧞 @LucyDavinhart - @SBGTechTeam Slides: goto.lmhd.me/hc2019slides

Editor's Notes

  1. Morning! Say you’re a small team of a couple of people. In charge of managing the company Vault cluster. Say hundreds of people across the company want to make use of Vault, across thousands of systems, each with their own granular level of access. So maybe you expect to get a dozen requests daily. Your team also manages several other services, so you can’t dedicate all your time to Vault. How are you going to manage that? You don’t want to hand out too many admin permissions, because that inevitably leads to the too-many-admins problem. Let’s imagine it’s also 2017, so you’ve not convinced your finance department to pay for Enterprise Vault, not that Namespaces exist to help yet anyway. What are you gonna do? Well I’ll tell you how we did it.
  2. We started our Vault journey back in late 2016. I’m going to touch on some of the problems we had back then, how we approached solving them initially, and how things improved for us as a result of using Terraform. And as this is a journey that never ends, I’m going to talk about some things we haven’t done yet.
  3. === 2m / -33m === So, some of the problems we had early on, which were blockers to us using Vault in production
  4. We’re pretty good at config management at SBG, so actually installing Vault was automated from the get-go. But actual configuration of the service once it was running? That was manual. We were new to Vault, so naturally doing things took a while, but even as we gained more experience with the product there were still some things which took time. Not just in terms of configuring Vault, but also helping people figure out what access they had and why. Writing or making changes to policies for example, especially when that involved comparing to existing policies, took a while. And when all of that config isn’t stored anywhere except Vault itself, it was often too time consuming to properly compare things. So we ended up with many similar things being done in very different ways. We also just like automating stuff. It’s in our job titles, after all.
  5. Because we had to do everything ourselves manually, we for the most part we felt we had to have access to everything. With everything done by hand, we had to be able to actually do everything.
  6. Vaut’s audit logs are great, and we were shipping those off to our Elastic Stack from very early on. But they can only answer so many questions. [click] For example, you can reasonably easily find out that somebody has written to a particular policy, who they are, and when they did it. [click] But the audit logs don’t tell you what changes they made, or why. And we can’t have infinite retention on those logs, so anything older than the retention period is gone
  7. === 4m 30s / - 30m 30s === So on our way to using Vault in production, we needed to put something in place to solve these problems, even if it was only going to be temporary.
  8. First problem we looked at was keeping track of changes over time. We wrote a Ruby gem which we ran as a scheduled Jenkins job It iterated through paths in Vault and read them, saving the content in a git repo. This included: Policies LDAP Groups AppRoles Specifically, we do not back up secrets! We put this together pretty quickly and, as a result, we now had a better ability to see what changed and when, but still no visibility on who or why. And were still making changes manually. But it wasn’t very good. I’m allowed to say that, because I wrote it. It was originally going to be used to make changes in Vault, but it was too clunky and we didn’t really have confidence in it to grant it write access
  9. Next problem we tackled was the time it took for us to make some changes. We deployed a tool called Goldfish, which was primarily a Vault UI before Open Source Vault had one, which was useful at this point as we had not yet migrated to the Enterprise version. Our justification for spinning it up was its policy request feature, which made it much simpler for people to request changes to policies or add new policies. Users edited directly in the UI, were given a policy approval token, which they then sent to us for review. We’d approve it and apply it. We still had to map those policies to LDAP groups and AppRoles, but this made things a little easier for us for a while.
  10. === 6m 45s / - 28m 15s === With those two things in place we had enough to assure people Vault was ready for Production, and we had more time to focus on doing things better.
  11. If you’ve decided to come to my talk, then I’m assuming you know what Vault is. But you may not know what Terraform is. If that’s you… in simple terms, it allows you to write code to define your resources in a declarative way. Typically this is things like cloud infrastructure, but it can be anything with an API. [click] You write your code to define what you want your stuff to look like [click], Terraform keeps track of the state of your resources, which lets it [click] figure out how to get it from the state it’s in now to the state you want it to be in. We’d been using it for a while for other things, and discovered that there was a Vault provider.
  12. We didn’t want to just create a repo with raw Terraform code. For a start, that would mean our users would have to learn Terraform at the same time as learning Vault. So we wanted it to resemble the Vault API as much as possible, on disk. So files in the right directories, parameters matching what you'd get with the API, etc. Partly this was to allow our users to learn more about how Vault works than would be the case if we abstracted things away. Particularly useful for when users wanted to request multiple interacting resources.
  13. And partly it was to reduce the learning curve on our users. Compare the file on the left to the file on the right. The left is some Terraform code to write a Vault policy. The right is just the policy. While this example is fairly simple, we don’t want to have to make our users learn the syntax on the left when all they really care about is what’s on the right.
  14. Initially, there would be some overlap between us configuring Vault using this Terraform, and configuring things manually, so we wanted it to take the output of the ruby gem as input, so Terraform didn’t try to delete anything we’d not written the code for yet. Making the repo resemble the Vault API was also useful for that. [click] We wanted people to be able to raise Pull Requests to make changes, so we could track who has made changes, who has approved them, what JIRA tickets they’re linked to, etc. [click] While we wanted to Terraform as much as possible, we were only going to start with policies to begin with, as those were our most common request. There were about 300 of them by the time we started this project in May 2018. We now have over 1000. [click] Finally, we wanted to make sure that anything that was not in the repo got deleted from Vault. The idea behind this being that there should be no unauthorized or unexplained changes to Vault. In Prod, we have restricted permissions, so we don’t even have the ability to do that. But in test, the rules are more relaxed, so it’s useful to be able to reset Vault to a known state.
  15. === 10m 15s / - 24m 45s ===
  16. === 12m 45s / - 22m 15s ===
  17. The pipeline is run as a Jenkins job, and with the exception of a few things, each stage corresponds to a makefile phony target. The idea being, you should be able to run the whole thing locally, which helps a lot when making changes to it.
  18. Run make help, I see all the stages of the pipeline. A few of them have dependencies on the init stage, and a few can be run standalone.
  19. The init stage, makes sure we have credentials to access our Terraform state. We store this in Amazon S3, so naturally we get our AWS creds out of Vault. We do a Terraform init to ensure we have all the necessary dependencies. And we make use of Terraform workspaces, which allows us to maintain separate Terraform state files for each of our Vault clusters.
  20. The import phase is our fail-secure mechanism, to make sure nothing is in Vault which shouldn’t be. Thinking back to the Venn Diagram, we’re checking for things which are in the left half of the diagram. Shouldn't happen too often, as we don't have permissions, but if it does then we can investigate. For each of the resources we support, we have a script which: Lists all resources of that type in Vault Lists all the resources in the Terraform state, i.e. those which Terraform knows about Imports into the Terraform state whatever is in Vault that Terraform doesn’t know about. The idea being, if we’ve told Terraform that it exists, but we haven’t written code to say that it's supposed to exist, then Terraform will delete it.
  21. I’ve simplified it a little, but an example looks a bit like this List all policies in Vault List all policies Terraform knows about Import anything Terraform doesn’t know about We have a script like this for each of the resources we support. We skip this stage when validating a pull request. It’s not really needed at that point.
  22. Then we come to the generate phase, where we actually make Terraform code Using policies as an example… We iterate over all policy files in the repo Get the policy name from the filename Then we generate the relevant Terraform resources This gets saved to an ephemeral .tf file
  23. TODO: remove this The generate scripts look similar to this. Again, I’ve slightly simplfied We iterate over all policy files in the repo Get the name of the policy by stripping the file extension, and converting to lowercase Then, for each policy file, we create two resources: A template file resource, so we can use the content of the policy file A Vault policy, which uses that template This gets saved to an ephemeral .tf file
  24. Validate doesn’t actually do much beyond verifying that the Terraform code we have generated is syntactically correct. There is some validation done during the generate phase, which I’ll touch on later
  25. At this point, we have a Terraform State which corresponds to everything in Vault We have generated Terraform code which corresponds to everything we want to be in Vault We run Terraform Plan, and it compares the two, and determines if it needs to make any changes. It saves those to a planfile, so we can guarantee that Terraform won’t try anything unexpected later. At this point, if we’re validating a pull request, we finish and mark the commit as successful.
  26. At this point, the entire Jenkins job has been running with read-only capabilities on the resources it’s been looking at. We’re comfortable allowing it to do this without human supervision, as we don’t really consider anything it’s reading from Vault to be secret, and the Terraform state can be regenerated from nothing with the import stage. The slack notification provides the Vault CLI command we need to generate the secret-id, so we don’t need to worry about it. It's prefixed with pscli, our Pretty Snazzy Command Line Interface, a tool my team looks after which does many things, but all you need to know is it makes sure everyone is using the same version of the Vault CLI and Terraform, and it handles all the Vault auth automatically.
  27. This is the part where Terraform actually goes ahead and makes the changes to Vault it said it would
  28. At this point, all that remains is to commit any generated terraform code to the repo, and merge our release branch to master. We don’t really need to do this, but it’s sometimes useful to see the raw terraform code that the job came up with.
  29. === 18m / - 17m === So we had a minimal viable product, something which solved part of the problem for us. We started making incremental improvements over time. I’m going to go over each new resource we added, because there’s something interesting for each of them
  30. LDAP groups, i.e. what policies should specific groups of human users have access to. We have about 250 of these in Vault at the moment, but there was only 90 when we added this in July 2018. When we first added LDAP groups in our pipeline, there was no dedicated resource for them in the Terraform provider, so we had to improvise. Fortunately, there was a resource called vault_generic_secret, which allows you to read and write to arbitrary Vault paths. This is very useful, but if you’re not careful with it you could end up revealing secrets. So treat it with care. But in our case, we do not consider LDAP groups and their policy mappings to be secrets, so we’re not too worried. Later on, there was a dedicated set of resources for the LDAP auth backend, so we switched over to that, completely invisibly to our users. Recently, we’ve had a restructure of all our Active Directory groups, and now only certain types of groups are allowed to grant permissions within systems like Vault. So we added a check in the script to make sure nobody accidentally added the wrong kind of group (as happened a couple of times).
  31. AppRoles, another common authentication mechanism for Vault. We had 160 ish when we added these to the pipeline, about a year ago in September 2018. That’s since doubled. The majority of these (about 2/3rds) are used to grant Jenkins jobs access to Vault, but the IP addresses used to reference these varied. So we let our users define Terraform variables, which they could then use when requesting AppRoles via this pipeline. This meant that whenever the team that looked after Jenkins added any new agents, it’s just one file that needs updating, and all the AppRoles get updated.
  32. AppRoles, another common authentication mechanism for Vault. We had 160 ish when we added these to the pipeline, about a year ago in September 2018. That’s since doubled. The majority of these (about 2/3rds) are used to grant Jenkins jobs access to Vault, but the IP addresses used to reference these varied. So we let our users define Terraform variables, which they could then use when requesting AppRoles via this pipeline. This meant that whenever the team that looked after Jenkins added any new agents, it’s just one file that needs updating, and all the AppRoles get updated.
  33. The team managing our Kubernetes clusters wanted to use k8s as an authentication mechanism for Vault. That seemed like a good idea, but as we didn’t manage the clusters, we didn’t know how it worked. Fortunately, we’d developed this repo in such a way that meant adding additional resources was just a case of writing an Import and a Generate script, and adding them in to the relevant Makefile phases.
  34. Vaut allows you to use AWS as an authentication mechanism, which a few teams were asking for us to enable. We have over 100 AWS accounts across the company, with access to those accounts managed by Vault. That’s using one set of AWS credentials per account. But for reasons I’m not going to go into here, there's something we need to configure for every AWS account in Vault if users want resources in those accounts to access Vault. Fortunately, it’s the same thing for every account, so our pipeline now runs an AWS CLI command to list all the accounts in our organization and automatically generates a Terraform resource for that relevant bit of config. And this just happens automatically whenever we create a new AWS account, so we don’t even have to think about it.
  35. This is another case where we had to get creative, because there isn’t a dedicated resource for these within the Terraform provider. That’s similar to LDAP groups when we first added those, but we can’t re-use the solution there. We can’t use generic_secret, because there are certain keys [click] which we either don’t know, or which change frequently, so using generic secret would result in Terraform getting stuck in a loop.
  36. Fortunately, there’s another resource, generic_endpoint, which is a bit more flexible. There’s a parameter which lets you specify which keys you care about, and as long as those remain unchanged within Vault, Terraform is happy.
  37. How has this improved things for us?
  38. Firstly, time. Individual changes take less of our time, which means we can handle more requests. We’ve got increased visibility of what’s going on in Vault, as it happens. We’ve also made it easier for people to debug their own permissions without needing to ask us. Easier for people to self-help and debug their own access without asking us. It’s now much easier to see the who, what, when, how and why of changes over time, allowing us to more easily audit historical access. It’s now possible to search through the code and find common patterns, and potentially identify issues before they become a problem. And as a result of our automation, we’ve been able to reduce our own default permissions within Vault.
  39. So, some numbers, for those interested Slack requests / MonkeyBot tickets… Since we started tracking these in JIRA, in January 2018, we’ve had over 2000 requests relating to Vault alone. Over 1000 of these include at least one Pull Request, for a total of over 1300 PRs The Jenkins job has run over 25,000 times since May 2018, running at a rate of about 750 every two weeks at the moment, which equates to about 40 unscheduled builds a day. i.e. builds triggered by pull requests being raised or merged.
  40. === 26m / - 9m === So what does the future look like for us? While I can’t tell you for certain, as priorities are constantly shifting, I do have a few ideas for things which we could do.
  41. First thing, obviously, there’s several more resources we use in Vault which aren’t terraformed yet. PKI mounts and roles, we have quite a few of these. Sentinel policies, we’re not using too much yet, but as and when our users want to make use of these in anger, they’d be a natural fit in our pipeline. Namespaces, a neat Enterprise feature we’re going to be making use of very soon, will need quite a few resources set up for each of these.
  42. We have over 100 AWS accounts, all managed through Vault, and all are configured very similarly. So with just the name of the account, we could theoretically generate the relevant policies and LDAP group mappings. I’m calling this The Future, but as recently as the past few weeks, the team managing our Kubernetes clusters have started auto-generating Pull Requests for their users. We could also do some service discovery for CIDR ranges, e.g. in the case of Jenkins agents, so those don’t need manually updated whenever IP ranges change. Though anywhere that rapidly scales, or where IP addresses are dynamic, realistically you’d want to use a different auth method altogether.
  43. We’re pretty happy with the tradeoff we’ve made between security and convenience, but if we ever needed to, we could add some additional safeguards. 2 Factor Auth for example, to grant the Jenkins job write access for example. Something we’ve discussed, but decided wasn’t worth the tradeoff yet. Control Groups are an Enterprise feature which looks super exciting, but which we’ve not yet found a use for. (It’s on my personal backlog of features to play with) If we ever decided we needed more than one human to approve the Jenkins job, or if we wanted to remove the necessity for a human to paste a secret-id into the job, we could make use of this.
  44. Our pipeline doesn’t do a great deal of validation at the moment. It’s mostly just checking if the syntax is correct, and a few resource-specific tests. We could do more, but in our case, we’ve found that the sort of issues we could write write checks for don’t actually happen often enough to be worth our time. Both in terms of how long it would take to implement, and how much time it would add to the build. This is also the sort of thing where Terraform Enterprise Sentinel Policies would come in handy.
  45. There are quite a few inefficiencies in the pipeline. It’s not really a problem right now, but as we grow, it’ll take longer. I have some ideas to make it faster which I’m gong to look into.
  46. Once we have Vault namespaces in place, admins of those namespaces will be in a similar position to where we were when we first started using Vault. So I have ideas for how we could make our Jenkins job generic enough that it could run against any namespace
  47. === 32m / - 3m === So, should you go away and create a Terraform pipeline that looks like ours? Probably not. Ours was the result of initial experimentation and incremental improvements, and the way we’ve chosen to implement things works well for us, but may not work well for you. If I was writing it from scratch now, there are things I'd do differently. But hopefully I’ve given you some insights into how we tackled the problems we had, and the inspiration to try it yourselves.
  48. Thank you all for listening! If you wanna find me and ask any questions, or of you want some stickers, my Twitter’s on screen. DMs are open. I’m also on the HashiConf slack, and should be easy to find by name.