GDG Cloud Southlake #8 Steve Cravens: Infrastructure as-Code (IaC) in 2022: Trends & Best Practices

IaC in 2022
Trends & Best Practices
GDG Cloud Southlake # 8 | 01/25/2022
Steve Cravens - Google
Stenio Ferreira - Google
Josh Addington - HashiCorp

To provide an overview of toolset and best practices around infrastructure as code (IaC)
implementation on Google Cloud.
Purpose
Use this slide deck when evaluating a customer’s cloud infrastructure development and
operations direction.
Delivery note
Key assumptions That the audience has a basic understanding or IaC and familiar with IaC toolset.
Intended audience Customer technical personnel and leadership involved in CCoE/CPT/Cloud adoption team.
Foreword

Automating through code the configuration and
provisioning of resources, so that human error is eliminated,
time is saved, and every step is fully documented.
Objective

Increasing demand
Requires rapid scaling of
IT infrastructure
Operational bottlenecks
Large Ops teams need to
overcome organizational and
technical bottlenecks
Disconnected feedback
Communication gap between
software and IT teams
Manual errors
Increased scale leads to
greater human errors
IaC is not an option, it’s the only way to solve

Automate
Declarative
Replicate
Validate
Modular
No reinventing the
wheel, use software
engineering practices
for infrastructure
Build reusable infrastructure blocks across an organization
Assess desired state vs. current state infrastructure
Commit, version, trace, deploy, and collaborate,
just like source code
Specify the desired state of infrastructure, not updates
Create and destroy multiple times easily
Benefits of IaC

Config Management performs:
OS package installation
Patching/maintenance of VM software
Not applicable for cloud-native services
Examples
● Install a web server in a VM
● Create namespaces in a GKE cluster
● Load data in BigQuery
Fundamentally, IaC is for provisioning and managing cloud resources, while Config Management is for
VM OS-level configuration.
IaC performs:
Provisioning of VMs and other Google Cloud
services
IaC wraps around the Google Cloud API
Not focused on package configuration
Examples
● Launch a VM
● Create GKE cluster & nodepool
● Create a BigQuery dataset
Provisioning vs configuration

Type Immutable Declarative Language
Google Cloud
Support
Terraform Provisioning ✔ ✔ HCL ✔*
Config
Connector
Provisioning ✔ ✔ YAML/KRM ✔*
Pulumi Provisioning ✔ ✔ JS, TS, Python, ...
Ansible Config mgmt YAML
Chef Config mgmt Ruby
* Support cases can be opened for Google Cloud resources managed via the Google provider.
IaC tool landscape

Manage cloud infrastructure with
Kubernetes tooling
kubectl
Config
Management
API Clients
Kubernetes API Server
Config Connector
Cloud
Spanner
Cloud
Memorystore
Cloud
Pub/Sub
Cloud IAM
Cloud SQL
Cloud
Storage
Google Cloud
Config
Connector

Manage cloud infrastructure via Kubernetes tooling. Config Connector registers resources via CRDs
and translates desired declarative state to imperative API calls.
Spanner
Cloud
SQL
Pub/Sub
Storage
Redis
IAM
KCC
controller
manager
CRD
etcd
CRUD
APIs
API server
Google Config Connector

Terraform is an infrastructure as code tool developed by HashiCorp that automates the building
and management of infrastructure using a declarative language
Large community
Multi-cloud
and multi-API
Open core with
enterprise support
Support for all major Cloud
providers as well as many
other services exposed
through an API (like GitHub,
Kubernetes)
Three different editions
ranging from self-hosted to
fully managed with
enterprise-level support
Thousands of third-party
providers an modules
available from the Terraform
Registry
Terraform

Terraform Google provider
● The Terraform provider for Google Cloud is
jointly developed by HashiCorp and
Google, with support for more than 250
Google Cloud resources
● Beta provider versions support products
and features which are not yet GA.
● Support cases can be opened for Google
provider resources.
● Google assets for Terraform are mainly
hosted in the Terraform Google Modules
GitHub in separate sets of assets:
○ Cloud Foundation Toolkit modules,
which cover most Google Cloud
products and are designed to be
opinionated and ready-to-use.
○ Fabric modules and examples, which
are designed as a starter kit to be
forked and owned to bootstrap
Google Cloud presence, and for
rapid prototyping.
Professional Services
Terraform assets
Terraform support from Google

Built by HashiCorp in 2014
Open core model
Repeatable without risk
Self service infrastructure
Multi cloud capable
Infrastructure as code tool
Terraform

© 2021 Google LLC. All rights reserved.
● Based on HCL2 (Hashicorp
Configuration Language),
similar to json and yaml but
more flexible
● Declarative language - describe
the desired end state and
Terraform figures out how to
get there
● Supports comments
Terraform Code
resource "google_storage_bucket_object" "picture"
{
name = "butterfly01"
source = "/images/nature/garden-tiger-moth.jpg"
bucket = "image-store"
}
data "google_compute_network" "my-network" {
name = "default-us-east1"
}
# This is a comment

● Providers
● Resources
● Data Sources
● Variables
● Outputs
● Modules
Terraform is composed of:

How Terraform interacts with the
different platforms
● IaaS: GCP, Aws, Azure, etc
● PaaS: Heroku, Keycloak
● SaaS: DNS Simple, lets encrypt
Providers
provider "google" {
project = "my-project-id"
region = "us-central1"
}

Every terraform resource is structured
exactly the same way.
● resource = Top level keyword
● type = Type of resource.
Example:
google_compute_instance.
● name = Arbitrary name to refer
to this resource. Used internally
by terraform. This ﬁeld cannot
be a variable.
Resources
resource type "name" {
parameter = "foo"
parameter2 = "bar"
list = ["one", "two", "three"]
}

Data sources are a way of querying a
provider to return an existing
resource, so that we can access its
parameters for our own use.
Data Sources
data "google_compute_image" "my_image" {
family = "debian-9"
project = "debian-cloud"
}

● Define the interface of a module
● Defaults allowed
● Can be configured at runtime or
in files
● Supports strings, maps, and
lists
Variables
variable "region" {
default = "us-east1"
}
provider "google" {
region = "${var.region}"
}

The outputs ﬁle is where you conﬁgure
any messages or data you want to show
at the end of a terraform apply.
Outputs File
output "catapp_url" {
value =
"http://${google_compute_instance.hashicat.network
_interface.0.access_config.0.nat_ip}"
}

The folder containing main.tf, variables.tf and other ﬁles can be reused in other deployments as
a module.
This allows capturing opinionated logic in a module, and other deployments can conﬁgure the
module’s variables and access its outputs without having to worry about implementation details
● Modules can be loaded from local references, Git(Hub), and the Terraform Module
Registry
● Google maintains several (primary development target for Cloud Foundation Toolkit)
Code Reuse - Modules
module "project-factory" {
source = "../../modules/project-factory"
name = "factory-simple-app"
org_id = "${var.organization_id}"
folder_id =
"${google_folder.projects_folder.name}"
}

Pattern for Terraform

Terraform is a stateful application. This means that it keeps track of everything you build
inside of a state file. This is tracked by the terraform.tfstate and terraform.tfstate.backup
files that appear inside the working directory.
The state file is Terraform's source of record for everything it knows about.
Terraform State
{
"terraform_version": "0.12.7",
"serial": 14,
"lineage":
"452b4191-89f6-db17-a3b1-4470dcb00607",
"outputs": {
"catapp_url": {
"value":
"http://go-hashicat-5c0265179ccda553.workshop.gcp.
hashidemos.io",
"type": "string"
},
}
}

Whenever you run a plan or apply, Terraform reconciles three different data sources:
1. What you wrote in your code
2. The state ﬁle
3. What actually exists
Terraform does its best to add, delete, change, or replace existing resources based on
what is in your *.tf ﬁles. Here are the four different things that can happen to each
resource during a plan/apply:
Changing Existing Infrastructure
+ create
- destroy
-/+ replace
~ update in-place

Gotchas
Credentials
Keep them safe and not in your code that is published in repos.
Especially public repos!!!
Immutable vs Mutable Infrastructure
Understanding the difference between the 2 is foundational to
the successful use of Terraform (LINK)
Versioning
When working with modules, versioning is very important to not
break other teams that leverage your module when deploying
new versions

03
HashiCorp
Day 2 Operations &
Beyond IaC

Copyright © 2021 HashiCorp
Day 2 Operations
terraform validate and other tools like terratest and TFLinter
terraform plan provides built-in dry run capability, drift detection
terraform import allows you bring existing resources into state
State ﬁle management over time is critical
Workspace management
Immutable Infrastructure

Leading cloud infrastructure
automation
Our software stack enables the provisioning, securing,
connecting, and running of apps and the infrastructure to
support them.
We unlock the cloud operating model for every business
and enable their digital transformation strategies to
succeed.

Infrastructure Automation
Multi-Cloud Compliance & Management to provision
and manage any infrastructure with one workﬂow
Self-Service infrastructure for users to easily
provision infrastructure on-demand with a library of
approved infrastructure modules
Provides the foundation for cloud
infrastructure automation using
infrastructure as code for provisioning and
compliance in the cloud operating model
https://www.terraform.io/

Development Environments Made Easy
HashiCorp Vagrant provides the same, easy workflow
regardless of your role as a developer, operator, or
designer. It leverages a declarative configuration file
which describes all your software requirements,
packages, operating system configuration, users, and
more.
The cost of fixing a bug exponentially increases the
closer it gets to production. Vagrant aims to mirror
production environments by providing the same
operating system, packages, users, and configurations,
all while giving users the flexibility to use their favorite
editor, IDE, and browser.
Unify workflows, enforce consistency, and
work across platforms
https://www.vagrantup.com/

vagrantﬁle.simple
Vagrant.configure("2") do |config|
config.vm.box = "google/gce"
config.vm.provider :google do |google|
google.google_project_id = "YOUR_GOOGLE_CLOUD_PROJECT_ID"
google.google_json_key_location = "/path/to/your/private-key.json"
# Make sure to set this to trigger the zone_config
google.zone = "us-central1-f"
google.zone_config "us-central1-f" do |zone1f|
zone1f.name = "testing-vagrant"
zone1f.image = "debian-9-stretch-v20211105"
zone1f.machine_type = "n1-standard-4"
zone1f.zone = "us-central1-f"
zone1f.metadata = {'custom' => 'metadata', 'testing' => 'foobarbaz'}
zone1f.scopes = ['bigquery', 'monitoring', 'https://www.googleapis.com/auth/compute']
zone1f.tags = ['web', 'app1']
end
end
end
https://www.vagrantup.com/

Machine Image Automation
It embraces modern conﬁguration management by
encouraging you to use automated scripts to install
and conﬁgure the software within your
Packer-made images. Packer brings machine
images into the modern age, unlocking untapped
potential and opening new opportunities.
Out of the box Packer comes with support to build
images for Amazon EC2, CloudStack, DigitalOcean,
Docker, Google Compute Engine, Microsoft Azure,
QEMU, VirtualBox, VMware, and more. Support for
more platforms is on the way, and anyone can add
new platforms via plugins.
Automate the creation of any time of
machine image
https://www.packer.io/

build.packer.hcl
https://www.packer.io/
variable "project_id" {
type = string
}
variable "zone" {
type = string
}
variable "builder_sa" {
type = string
}
source "googlecompute" "test-image" {
project_id = var.project_id
source_image_family = "ubuntu-2104"
zone = var.zone
image_description = "Created with Packer from Cloudbuild"
ssh_username = "root"
tags = ["packer"]
impersonate_service_account = var.builder_sa
}
build {
sources = ["sources.googlecompute.test-image"]
}

Security Automation
Secrets management to centrally store and
protect secrets across clouds and applications
Data encryption to keep application data secure
across environments and workloads
Advanced Data Protection to secure workloads
and data across traditional systems, clouds, and
infrastructure.
Provides the foundation for cloud security
that uses trusted sources of identity to keep
secrets and application data secure in the
cloud operating model
https://www.vaultproject.io/

vault-policy.hcl
https://www.vaultproject.io/
# This section grants all access on "secret/*". Further restrictions can be
# applied to this broad policy, as shown below.
path "secret/*" {
capabilities = ["create", "read", "update", "delete", "list"]
}
# Even though we allowed secret/*, this line explicitly denies
# secret/super-secret. This takes precedence.
path "secret/super-secret" {
capabilities = ["deny"]
}
# Policies can also specify allowed, disallowed, and required parameters. Here
# the key "secret/restricted" can only contain "foo" (any value) and "bar" (one
# of "zip" or "zap").
path "secret/restricted" {
capabilities = ["create"]
allowed_parameters = {
"foo" = []
"bar" = ["zip", "zap"]
}
}

Simple and Secure Remote Access
Traditional approaches like SSH bastion hosts or
VPNs require distributing and managing
credentials, conﬁguring network controls like
ﬁrewalls, and exposing the private network.
Boundary provides a secure way to access hosts
and critical systems without having to manage
credentials or expose your network, and is entirely
open source.
Access any system from anywhere
based on user identity
https://www.boundaryproject.io/

boundary_role.tf
https://www.boundaryproject.io/
resource "boundary_scope" "org" {
name =
"organization_one"
description = "My first
scope!"
scope_id = "global"
auto_create_admin_role = true
auto_create_default_role = true
}
resource "boundary_user" "foo" {
name = "User 1"
scope_id = boundary_scope.org.id
}
resource "boundary_user" "bar" {
name = "User 2"
scope_id = boundary_scope.org.id
}
resource "boundary_role" "example" {
name = "My role"
description = "My first role!"
principal_ids = …

Network Automation
Service registry & health monitoring to provide a
real-time directory of all services with their health
status
Network middleware automation with service
discovery for dynamic reconﬁguration as services
scale up, down or move
Zero trust network with service mesh to enable
identity-based security enforced at the endpoints
via sidecar proxies
Provides the foundation for cloud network
automation as a central service registry for
service-based networking in the cloud
operating model
https://www.consul.io/

conﬁg.hcl
https://www.consul.io/
datacenter = "us-west1"
data_dir = "/opt/consul"
log_level = "INFO"
node_name = "foobar"
server = true
watches = [
{
type = "checks"
handler = "/usr/bin/health-check-handler.sh"
}
]
telemetry {
statsite_address = "127.0.0.1:2180"
}

A simple and ﬂexible workload orchestrator
to deploy and manage containers and
non-containerized applications across
on-prem and clouds at scale.
Workload Orchestration Made Easy
Container Orchestration for deploying, managing and
scaling containerized applications
Legacy Application Orchestration to containerize,
deploy and manage legacy apps on existing
infrastructure
Batch Workload Orchestration to enable ML, AI, data
science and other intensive workloads in high
performance computing (HPC) scenarios
https://www.nomadproject.io/

example.nomad
https://www.nomadproject.io/
job "docs" {
datacenters = ["dc1"]
group "example" {
network {
port "http" {
static = "5678"
}
}
task "server" {
driver = "docker"
config {
image = "hashicorp/http-echo"
ports = ["http"]
args = [
"-listen",
":5678",
"-text",
"hello world",
]
}
}
}
}

Build, Deploy, and Release Automation
Build applications for any language or framework.
You can use Buildpacks for automatically building
common frameworks or custom Dockerfiles or
other build tools for more fine-grained control.
Deploy artifacts created by the build step to a
variety of platforms, from Kubernetes to EC2 to
static site hosts.
Release your staged deployments and makes them
accessible to the public. This works by updating
load balancers, configuring DNS, etc. The exact
behavior depends on your target platform.
A single configuration file and workflow
across platforms such as Kubernetes,
Nomad, EC2, Google Cloud Run, and more.
https://www.waypointproject.io/

waypoint.hcl
https://www.waypointproject.io/
project = "example-nodejs"
app "example-nodejs" {
labels = {
"service" = "example-nodejs",
"env" = "dev"
}
build {
use "pack" {}
registry {
use "docker" {
image = "gcr.io/<my-project-id>/example-nodejs"
tag = "latest"
}
}
}
deploy {
use "google-cloud-run" {
project = "<my-project-id>"
location = "us-east1"
port = 5000
static_environment = {
"NAME" : "World"
}
capacity {
memory = 128
cpu_count = 1
max_requests_per_container = 10
request_timeout = 300
}
auto_scaling {
max = 2
}
}
}
release {
use "google-cloud-run" {}
}
}

Delivering app
workloads to
multi-cloud
environments
with a single
control plane
at every layer

Is there a central team which will own foundational IaC or central modules?
What percentage of the Google Cloud infrastructure will be managed via IaC?
1 Is IaC a priority, and if so which automation tool will be used?
2
3
Which other teams (networking, security, etc.) will manage separate IaC stages?
4
What is the process to request and managed common resources (projects,
subnets, firewall rules, etc.)?
5
Will there be a testing strategy in place?
6
Key decisions

Reference Code
Terraform GCP docs
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Cloud Foundation Toolkit - basic full deployments
https://opensource.google/projects/cloud-foundation-toolkit
Cloud Foundation Fabric - advanced examples & modules
https://github.com/GoogleCloudPlatform/cloud-foundation-fabric
Architecture Blueprints - advanced full deployments
https://cloud.google.com/architecture?doctype=blueprint

Collaborate in
source control
Reduce manual
effort and errors
Enforce policies
proactively
Ensure
consistency
Developer
submits
Pull Request
CI
runs
Validation
Administrator
reviews for
Policy Compliance
Administrator
merges the
New Config
CD
updates
Deployed
Infrastructure
IaC change management through GitOps

Central team takes full
ownership of IaC codebase
Central team takes ownership of
Core Infra and CI/CD process
Infrastructure and app teams
owning their IaC and CI/CD
● Full control over infra
● Close collaboration and clear
responsibilities
● Works well for small sized infra
● Does not scale well
● Growing toil when working with
sec/network/app teams
● No simple way to share
responsibilities
● Full control over core infra
● Centralized IaC pipelines
● Shared responsibilities model
● Requires upskilling app/infra teams
● More time to ramp up
● Rapid development and prototyping
● Decentralized infra with autonomous
development
● No control over security and
governance
● No unified CICD
● Multiple teams are solving the same
challenges
Code Ownership

Single repo, multiple environments
● How frequently are environments really
identical. Differences cause lots of if/else code
● Requires discipline to merge changes from
different releases. Will likely require feature
branches, not just dev/staging/prod
● How do you handle hotfixes? Hot do you
backport hotfixes?
● More robust but requires more engineering
effort and a highly skilled team to operate the
e2e process
Multiple repos, one per environment
● Simplifies branching. Each environment has
its own folder/repo
● Requires a central repository of well-crafted
versioned modules
● Can easily accommodate per environment
differences
● Higher risk of drift between environments,
but also much easier to manage
● Hotfixes are just applied to the right
environment
● How do you promote/apply changes
between environments?
Multi-repo vs Monorepo considerations

Prod
Owner
Folder
NonProd
Owner
Folder
Dev
Owner
Folder
CICD
Cloud Build
Prod
NonProd
Dev
Feature
Branches
release/1.1.0 release/1.2.0 release/1.2.1
Monorepo branching strategy

Don’t allow manual
changes
Use IaC to provision resources on
the defined levels, restrict users
with viewer only access.
Monitor audit logs for non
IaC changes
Monitor audit logs for “write”
changes made by non IaC service
account.
IaC tools only take care of resources defined and created by IaC code, but do not cover
manual changes to the cloud environment.
Define levels to be
automated by IaC
Define up to which level IaC
will be used and where
manual access (and drift) is
allowed.
1 2 3
Handling drift

Partition management in stages
● understand security boundaries
● use folders as IAM nodes at each boundary split
(tenant, environment, etc.)
● use a separate automation stage to create
prerequisites for the next boundary
Once Terraform runs
● State often contains sensitive data, and needs to
be protected accordingly
● Automation service accounts embed powerful
roles – need to ensure the certain boundaries
can not be crossed
Enforcement of boundaries is often ad-hoc and
fragile
● a single all-powerful service account is used to
manage different environments
● the same code and backend are run for all
environments, and Terraform workspaces used
to separate (not isolate) their state
Problem Solution
Terraform best practices:
Separation of duties (per env/bu/stage)

Hashicorp guidelines strongly recommend having a flat module tree, with only one level of child modules
This style encourages the creation of flexible and composable modules that are wired together via inputs and outputs.
Prefer this
my-org-nested/
├── business-unit/
│ ├── folder/
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ └── variables.tf
│ ├── main.tf
│ ├── outputs.tf
│ └── variables.tf
├── main.tf
├── outputs.tf
└── variables.tf
my-org-flat/
├── modules/
│ ├── business-unit/
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ └── variables.tf
│ └── folder/
│ ├── main.tf
│ ├── outputs.tf
│ └── variables.tf
├── main.tf
├── outputs.tf
└── variables.tf
Over this
Prefer composition over embedding

Top-level modules should only call other
modules, connected via inputs / outputs
(previous slide)
Having a mix of resources and modules is
usually a sign of incomplete or badly
designed modules
module "org" {
source = "./modules/organization"
organization_id = var.organization_id
bindings = [ ... ]
}
resource "google_essential_contacts_contact" "contact" {
provider = google-beta
for_each = var.contacts
parent = var.organization_id
email = each.key
language_tag = "en"
notification_types = each.value
}
What's the best way to create this resource?
Avoid mixing modules and resources

module "vpc" {
source = "../modules/net-vpc"
project_id = "my-project"
name = "my-network"
subnets = [
{
ip_cidr_range = "10.0.0.0/24"
name = "prod-west1"
region = "europe-west1"
},
{
ip_cidr_range = "10.0.16.0/24"
name = "prod-west2"
region = "europe-west2"
}
]
}
Expose through a single module resources that
work in tandem
● VPC + Subnets
● Project + APIs
● MIGs + VMs + Disks
● Any resource with IAM
Tie logically related resources in a single
module

Prefer modules that manage a single instance of
the underlying resource.
● Makes interface simpler
● Makes code simpler
● Potentially avoids issues with dynamic keys
// avoid this
module "buckets" {
source = "./modules/gcs"
project_id = "myproject"
names = ["bucket-one", "bucket-two"]
uniform_access = {
bucket-one = true
bucket-two = false
}
}
// better
module "buckets" {
source = "./modules/gcs"
for_each = local.buckets
project_id = "myproject"
name = each.key
uniform_access = each.value
}
Leverage for_each with modules

// better: implicit dependency
module "project-services" {
source = "google/project_services"
activate_apis = [
"kms.googleapis.com",
]
}
module "keyring" {
source = "google/kms_keyring"
project_id = module.project-services.project_id
name = "my-keyring"
location = "global"
}
// ok: explicit dependency
module "project-services" {
source = "google/project_services"
activate_apis = [
"kms.googleapis.com",
]
}
module "keyring" {
source = "google/kms_keyring"
name = "my-keyring"
location = "global"
depends_on = [module.project-services]
}
Terraform allows declaring explicit dependencies using the depends_on meta-attribute.
However, Terraform can automatically discover dependencies, and use them to build the dependency tree that
defines the order in which resources are managed.
Using depends_on is usually a code smell.
Terraform best practices: Avoid depends_on

● Use locals freely Example
● Use for expressions Example
● Prefer for_each over count Why?
● Know terraform built-in functions
○ Specially string and collection functions
● Avoid local-exec
● Avoid deprecated language features:
○ String-as-values x = "${expression}" → x = expression
○ element() function element(var.mylist, 1) → var.mylist[1]
○ list() function list(a, b, c) → tolist([element])
Module implementation tricks

● Two spaces for indentation
● Align values at equals
● Nested blocks below arguments
● Meta arguments go first
● Blocks are separated by one blank line
● Use the standard module structure
● You don't have to remember this: use
terraform fmt and/or tflint
# bad (don't do this)
variable "name" {}
variable "zone" {}
output "id" {value=local.result}
# better
variable "name" {
description = "VM name"
type = string
}
variable "zone" {
description = "VM zone"
type = string
default = "europe-west1-b"
}
output "vm_id" {
description = "VM id"
value = local.result
}
Follow terraform code conventions

backend.tf
terraform {
backend "gcs" {
bucket = "tf-state-prod"
prefix = "terraform/state"
}
}
● Store state in a Cloud Storage bucket
● Enable Object Versioning on the state GCS
bucket
● Segregate state into stages
● Stage SA permissions only for corresponding
stage GCS bucket
● Feed values from previous stages using
variables. Using variables makes explicit any
requirements and allows Terraform to validate if
the values are provided.
● Never change state manually, use tf state rm /
tf import instead.
Terraform Considerations: State management

# non authoritative (one IAM identity)
resource "google_storage_bucket_iam_member" "member" {
bucket = "my-bucket-name"
role = "roles/storage.objectViewer"
member = "serviceAccount:foo@myprj.iam.gserviceaccount.com"
}
# authoritative for role
resource "google_storage_bucket_iam_binding" "binding" {
role = "roles/owner"
members = [ "user:jane@example.com" ]
}
# authoritative for resource (dangerous)
data "google_iam_policy" "foo-policy" {
binding {
role = "roles/storage.admin"
members = [ "group:yourgroup@example.com" ]
}
}
resource "google_storage_bucket_iam_policy" "member" {
policy_data = data.google_iam_policy.foo-policy.policy_data
}
IAM bindings are an integral part of any IaC
setup, and knowing the options provided by
the Google Cloud provider is important to
implement them properly and avoid conflicts.
The Google Cloud provider usually supports
bindings for different entities (org, project,
etc.) through three classes of IAM resources:
1. non authoritative
2. authoritative for a given role, and
3. authoritative for the resource.
You typically only want one approach in order
to avoid potential conflicts.
Terraform Considerations: IAM Bindings

For a given resource, an IAM policy is a set of bindings
of the form
(role, list of identities)
{
"bindings": [
{
"role": "roles/storage.admin"
"members": [
"user:alice@example.com",
"group:admins@example.com"
],
},
{
"role": "roles/storage.objectViewer"
"members": [
"user:bob@example.com"
],
}
],
"etag": "BwUjMhCsNvY=",
"version": 1
}
IAM Policy structure

{
"bindings": [
{
"members": [
],
},
{
"members": [
],
}
],
"version": 1
}
# authoritative for resource (dangerous)
data "google_iam_policy" "foo-policy" {
binding {
members = [
"user:alice@example.com", "group:admins@example.com"
]
}
binding {
role = "roles/compute.admin"
members = ["user:bob@example.com"]
}
}
resource "google_storage_bucket_iam_policy" "policy" {
policy_data = data.google_iam_policy.foo-policy.policy_data
}
Authoritative for the whole IAM policy

# authoritative for role
resource "google_storage_bucket_iam_binding" "binding" {
members = [
"user:jane@example.com",
"group:storage@example.com"
]
}
{
"bindings": [
{
"members": [
"user:jane@example.com",
"group:storage@example.com"
],
},
{
"members": [
],
}
],
"etag": "BwXCRbTTQKI=",
"version": 2
}
Authoritative for a single role

# non authoritative (one IAM identity)
resource "google_storage_bucket_iam_member" "member1" {
role = "roles/storage.objectViewer"
member = "group:viewers@example.com"
}
{
"bindings": [
{
"members": [
],
},
{
"members": [
"user:bob@example.com",
“group:viewers@example.com”,
],
}
],
"version": 3
}
Non-authoritative

Internal or external IaC code
Always prefer internally-maintained code when in-house coding skills are present.
Terraform modules are a great way to encapsulate complexity and embed organizational requirements
and policies (like regionalization), while allowing less technical teams to profit from IaC.
Control
Internal modules allow
you to retain control
over critical parts of
your infrastructure
automation.
Support
Directly managing your
modules allows you to
react quicker to bugs or
provider changes.
Centralize
Centralize modules to
share best practices
across team, and
enforce policies.
Document
Lean code and few
abstraction layers turn
your IaC code into live
documentation.

Scope Who How What
Org setup cloud / infra team manual automation resources, initial org roles, audit logging
Hierarchy cloud / infra team CI/CD org-level hierarchy (folders, roles, shared projects)
Security security team CI/CD org-level security resources (sinks, KMS, CSCC, etc.)
Networking network team CI/CD shared networking resources (ICs, VPC hosts, etc.)
possibly leverage YAML/JSON for firewall and subnets
Modules cloud / infra team N/A central module repository
Projects (factory) cloud / infra team CI/CD
or portal
managed/automated provisioning of projects
possibly leverage YAML/JSON as data format
VMs (factory) cloud / infra team CI/CD
or portal
managed/automated provisioning of instances
possibly leverage YAML/JSON as data format
Partitioning IaC in stages

Team1
Networking Security
Teams iac
App1
Dev
App2
Prod
Dev
Prod
app1-dev
sec-prod sec-dev
net-prod net-dev
app1-prod
Logs
organization folders/IAM security networking project factory
customer.com
Mapping IaC stages to resource hierarchy

Create self-contained Terraform modules dedicated to
management of specific resources (projects, firewall rules, etc.)
● Embed organizational and security requirements to
enforce them at the IaC level
● Accept inputs in common descriptive languages (like
YAML) to allow non-coders to manage infrastructure
with code
● Plug in portals to offer auto-provisioning of specific
resources via IaC - GCP Private Catalog, ServiceNow,
etc
● Use for resources that are commonly deployed based on
day-to-day needs (a firewall rule, a new project, etc.)
Leverage IaC for non-technical teams or
interface to existing tools.
firewall/rules/ssh-rule.yaml
IaC factories

Managed vs. unmanaged
Terraform Enterprise
Pros
● Complies to security/location
requirements
● Full support
● Additional features
Cons
● Infrastructure and license costs
● Large operational overhead
Terraform Cloud
Pros
● Small operational overhead
● Fully supported
● Additional features
Cons
● License costs
● Remote state/execution might
not map to requirements
Terraform Open Source
Pros
● Complies to security/location
requirements
● No license costs
● Widely used, distributed kb
Cons
● Limited support
● Medium operational overhead
Terraform Open Source can be used to bootstrap, even if full support is needed later on
Running Terraform in production

Library Language What
Kitchen Terraform Ruby Non-trivial tooling and dependencies; uses the InSpec Google
provider to validate against created resources.
Terratest Go Leverages the standard Go testing framework; works as a wrapper
for the Terraform executable.
Tftest Python Leverages the standard Python unit testing framework; works as a
wrapper for the Terraform executable.
IaC lifecycle should follow the same best practices used for other types of production code
including testing, especially at the module level.
Testing plan output instead of creating actual resources is a valid minimally viable strategy to ensure
code correctness and compliance with provider changes.
Testing Terraform code

Tool Vendor What
Sentinel Hashicorp Built-in with Terraform Cloud and terraform Enterprise. Uses its own
policy language.
OPA Open Source De-facto standard for policy enforcement. Can process Terraform
plan outputs via custom integrations.
Terrascan Accurics Static code analyzer for Terraform. Verifies code complies with
policies before executing it.
Use policy as code to automatically enforce company-wide requirements with Terraform, to ensure code
correctness and compliance with provider changes.
Terraform policy enforcement

GDG Cloud Southlake #8 Steve Cravens: Infrastructure as-Code (IaC) in 2022: Trends & Best Practices

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to GDG Cloud Southlake #8 Steve Cravens: Infrastructure as-Code (IaC) in 2022: Trends & Best Practices

Similar to GDG Cloud Southlake #8 Steve Cravens: Infrastructure as-Code (IaC) in 2022: Trends & Best Practices (20)

More from James Anderson

More from James Anderson (20)

Recently uploaded

Recently uploaded (20)

GDG Cloud Southlake #8 Steve Cravens: Infrastructure as-Code (IaC) in 2022: Trends & Best Practices