Infrastructure as Code (IaC) is a concept that has been around for a while now and much research has been done to not only prove out the value but also how to enhance IaC implementations. We have a full guest list including Steve Cravens, who can speak to the school of hard knocks of why IaC is important. Stenio Ferreira, who prior to Google worked at Hashicorp and has vast experience on how to successfully implement IaC with Terraform. Lastly, Josh Addington, who is an Sr. Solutions Engineer at Hashicorp and will be speaking to the Day 2 operations as well as other offerings that can enhance IaC implementations.
Here is the high level overview:
• IaC overview
• Terraform Tactical
• IaC day 2 and Governance
GDG Cloud Southlake #8 Steve Cravens: Infrastructure as-Code (IaC) in 2022: Trends & Best Practices
1. IaC in 2022
Trends & Best Practices
GDG Cloud Southlake # 8 | 01/25/2022
Steve Cravens - Google
Stenio Ferreira - Google
Josh Addington - HashiCorp
2. To provide an overview of toolset and best practices around infrastructure as code (IaC)
implementation on Google Cloud.
Purpose
Use this slide deck when evaluating a customer’s cloud infrastructure development and
operations direction.
Delivery note
Key assumptions That the audience has a basic understanding or IaC and familiar with IaC toolset.
Intended audience Customer technical personnel and leadership involved in CCoE/CPT/Cloud adoption team.
Foreword
3. Automating through code the configuration and
provisioning of resources, so that human error is eliminated,
time is saved, and every step is fully documented.
Objective
6. Increasing demand
Requires rapid scaling of
IT infrastructure
Operational bottlenecks
Large Ops teams need to
overcome organizational and
technical bottlenecks
Disconnected feedback
Communication gap between
software and IT teams
Manual errors
Increased scale leads to
greater human errors
IaC is not an option, it’s the only way to solve
7. Automate
Declarative
Replicate
Validate
Modular
No reinventing the
wheel, use software
engineering practices
for infrastructure
Build reusable infrastructure blocks across an organization
Assess desired state vs. current state infrastructure
Commit, version, trace, deploy, and collaborate,
just like source code
Specify the desired state of infrastructure, not updates
Create and destroy multiple times easily
Benefits of IaC
8. Config Management performs:
OS package installation
Patching/maintenance of VM software
Not applicable for cloud-native services
Examples
● Install a web server in a VM
● Create namespaces in a GKE cluster
● Load data in BigQuery
Fundamentally, IaC is for provisioning and managing cloud resources, while Config Management is for
VM OS-level configuration.
IaC performs:
Provisioning of VMs and other Google Cloud
services
IaC wraps around the Google Cloud API
Not focused on package configuration
Examples
● Launch a VM
● Create GKE cluster & nodepool
● Create a BigQuery dataset
Provisioning vs configuration
9. Type Immutable Declarative Language
Google Cloud
Support
Terraform Provisioning ✔ ✔ HCL ✔*
Config
Connector
Provisioning ✔ ✔ YAML/KRM ✔*
Pulumi Provisioning ✔ ✔ JS, TS, Python, ...
Ansible Config mgmt YAML
Chef Config mgmt Ruby
* Support cases can be opened for Google Cloud resources managed via the Google provider.
IaC tool landscape
10. Manage cloud infrastructure with
Kubernetes tooling
kubectl
Config
Management
API Clients
Kubernetes API Server
Config Connector
Cloud
Spanner
Cloud
Memorystore
Cloud
Pub/Sub
Cloud IAM
Cloud SQL
Cloud
Storage
Google Cloud
Config
Connector
11. Manage cloud infrastructure via Kubernetes tooling. Config Connector registers resources via CRDs
and translates desired declarative state to imperative API calls.
Spanner
Cloud
SQL
Pub/Sub
Storage
Redis
IAM
KCC
controller
manager
CRD
etcd
CRUD
APIs
API server
Google Config Connector
12. Terraform is an infrastructure as code tool developed by HashiCorp that automates the building
and management of infrastructure using a declarative language
Large community
Multi-cloud
and multi-API
Open core with
enterprise support
Support for all major Cloud
providers as well as many
other services exposed
through an API (like GitHub,
Kubernetes)
Three different editions
ranging from self-hosted to
fully managed with
enterprise-level support
Thousands of third-party
providers an modules
available from the Terraform
Registry
Terraform
13. Terraform Google provider
● The Terraform provider for Google Cloud is
jointly developed by HashiCorp and
Google, with support for more than 250
Google Cloud resources
● Beta provider versions support products
and features which are not yet GA.
● Support cases can be opened for Google
provider resources.
● Google assets for Terraform are mainly
hosted in the Terraform Google Modules
GitHub in separate sets of assets:
○ Cloud Foundation Toolkit modules,
which cover most Google Cloud
products and are designed to be
opinionated and ready-to-use.
○ Fabric modules and examples, which
are designed as a starter kit to be
forked and owned to bootstrap
Google Cloud presence, and for
rapid prototyping.
Professional Services
Terraform assets
Terraform support from Google
15. Built by HashiCorp in 2014
Open core model
Repeatable without risk
Self service infrastructure
Multi cloud capable
Infrastructure as code tool
Terraform
27. Gotchas
Credentials
Keep them safe and not in your code that is published in repos.
Especially public repos!!!
Immutable vs Mutable Infrastructure
Understanding the difference between the 2 is foundational to
the successful use of Terraform (LINK)
Versioning
When working with modules, versioning is very important to not
break other teams that leverage your module when deploying
new versions
31. Leading cloud infrastructure
automation
Our software stack enables the provisioning, securing,
connecting, and running of apps and the infrastructure to
support them.
We unlock the cloud operating model for every business
and enable their digital transformation strategies to
succeed.
48. Is there a central team which will own foundational IaC or central modules?
What percentage of the Google Cloud infrastructure will be managed via IaC?
1 Is IaC a priority, and if so which automation tool will be used?
2
3
Which other teams (networking, security, etc.) will manage separate IaC stages?
4
What is the process to request and managed common resources (projects,
subnets, firewall rules, etc.)?
5
Will there be a testing strategy in place?
6
Key decisions
49. Reference Code
Terraform GCP docs
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Cloud Foundation Toolkit - basic full deployments
https://opensource.google/projects/cloud-foundation-toolkit
Cloud Foundation Fabric - advanced examples & modules
https://github.com/GoogleCloudPlatform/cloud-foundation-fabric
Architecture Blueprints - advanced full deployments
https://cloud.google.com/architecture?doctype=blueprint
51. Collaborate in
source control
Reduce manual
effort and errors
Enforce policies
proactively
Ensure
consistency
Developer
submits
Pull Request
CI
runs
Validation
Administrator
reviews for
Policy Compliance
Administrator
merges the
New Config
CD
updates
Deployed
Infrastructure
IaC change management through GitOps
52. Central team takes full
ownership of IaC codebase
Central team takes ownership of
Core Infra and CI/CD process
Infrastructure and app teams
owning their IaC and CI/CD
● Full control over infra
● Close collaboration and clear
responsibilities
● Works well for small sized infra
● Does not scale well
● Growing toil when working with
sec/network/app teams
● No simple way to share
responsibilities
● Full control over core infra
● Centralized IaC pipelines
● Shared responsibilities model
● Requires upskilling app/infra teams
● More time to ramp up
● Rapid development and prototyping
● Decentralized infra with autonomous
development
● No control over security and
governance
● No unified CICD
● Multiple teams are solving the same
challenges
Code Ownership
53. Single repo, multiple environments
● How frequently are environments really
identical. Differences cause lots of if/else code
● Requires discipline to merge changes from
different releases. Will likely require feature
branches, not just dev/staging/prod
● How do you handle hotfixes? Hot do you
backport hotfixes?
● More robust but requires more engineering
effort and a highly skilled team to operate the
e2e process
Multiple repos, one per environment
● Simplifies branching. Each environment has
its own folder/repo
● Requires a central repository of well-crafted
versioned modules
● Can easily accommodate per environment
differences
● Higher risk of drift between environments,
but also much easier to manage
● Hotfixes are just applied to the right
environment
● How do you promote/apply changes
between environments?
Multi-repo vs Monorepo considerations
55. Don’t allow manual
changes
Use IaC to provision resources on
the defined levels, restrict users
with viewer only access.
Monitor audit logs for non
IaC changes
Monitor audit logs for “write”
changes made by non IaC service
account.
IaC tools only take care of resources defined and created by IaC code, but do not cover
manual changes to the cloud environment.
Define levels to be
automated by IaC
Define up to which level IaC
will be used and where
manual access (and drift) is
allowed.
1 2 3
Handling drift
57. Partition management in stages
● understand security boundaries
● use folders as IAM nodes at each boundary split
(tenant, environment, etc.)
● use a separate automation stage to create
prerequisites for the next boundary
Once Terraform runs
● State often contains sensitive data, and needs to
be protected accordingly
● Automation service accounts embed powerful
roles – need to ensure the certain boundaries
can not be crossed
Enforcement of boundaries is often ad-hoc and
fragile
● a single all-powerful service account is used to
manage different environments
● the same code and backend are run for all
environments, and Terraform workspaces used
to separate (not isolate) their state
Problem Solution
Terraform best practices:
Separation of duties (per env/bu/stage)
58. Hashicorp guidelines strongly recommend having a flat module tree, with only one level of child modules
This style encourages the creation of flexible and composable modules that are wired together via inputs and outputs.
Prefer this
my-org-nested/
├── business-unit/
│ ├── folder/
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ └── variables.tf
│ ├── main.tf
│ ├── outputs.tf
│ └── variables.tf
├── main.tf
├── outputs.tf
└── variables.tf
my-org-flat/
├── modules/
│ ├── business-unit/
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ └── variables.tf
│ └── folder/
│ ├── main.tf
│ ├── outputs.tf
│ └── variables.tf
├── main.tf
├── outputs.tf
└── variables.tf
Over this
Terraform best practices:
Prefer composition over embedding
59. Top-level modules should only call other
modules, connected via inputs / outputs
(previous slide)
Having a mix of resources and modules is
usually a sign of incomplete or badly
designed modules
module "org" {
source = "./modules/organization"
organization_id = var.organization_id
bindings = [ ... ]
}
resource "google_essential_contacts_contact" "contact" {
provider = google-beta
for_each = var.contacts
parent = var.organization_id
email = each.key
language_tag = "en"
notification_types = each.value
}
What's the best way to create this resource?
Terraform best practices:
Avoid mixing modules and resources
60. module "vpc" {
source = "../modules/net-vpc"
project_id = "my-project"
name = "my-network"
subnets = [
{
ip_cidr_range = "10.0.0.0/24"
name = "prod-west1"
region = "europe-west1"
},
{
ip_cidr_range = "10.0.16.0/24"
name = "prod-west2"
region = "europe-west2"
}
]
}
Expose through a single module resources that
work in tandem
● VPC + Subnets
● Project + APIs
● MIGs + VMs + Disks
● Any resource with IAM
Terraform best practices:
Tie logically related resources in a single
module
61. Prefer modules that manage a single instance of
the underlying resource.
● Makes interface simpler
● Makes code simpler
● Potentially avoids issues with dynamic keys
// avoid this
module "buckets" {
source = "./modules/gcs"
project_id = "myproject"
names = ["bucket-one", "bucket-two"]
uniform_access = {
bucket-one = true
bucket-two = false
}
}
// better
module "buckets" {
source = "./modules/gcs"
for_each = local.buckets
project_id = "myproject"
name = each.key
uniform_access = each.value
}
Terraform best practices:
Leverage for_each with modules
62. // better: implicit dependency
module "project-services" {
source = "google/project_services"
project_id = var.project_id
activate_apis = [
"kms.googleapis.com",
]
}
module "keyring" {
source = "google/kms_keyring"
project_id = module.project-services.project_id
name = "my-keyring"
location = "global"
}
// ok: explicit dependency
module "project-services" {
source = "google/project_services"
project_id = var.project_id
activate_apis = [
"kms.googleapis.com",
]
}
module "keyring" {
source = "google/kms_keyring"
project_id = var.project_id
name = "my-keyring"
location = "global"
depends_on = [module.project-services]
}
Terraform allows declaring explicit dependencies using the depends_on meta-attribute.
However, Terraform can automatically discover dependencies, and use them to build the dependency tree that
defines the order in which resources are managed.
Using depends_on is usually a code smell.
Terraform best practices: Avoid depends_on
63. ● Use locals freely Example
● Use for expressions Example
● Prefer for_each over count Why?
● Know terraform built-in functions
○ Specially string and collection functions
● Avoid local-exec
● Avoid deprecated language features:
○ String-as-values x = "${expression}" → x = expression
○ element() function element(var.mylist, 1) → var.mylist[1]
○ list() function list(a, b, c) → tolist([element])
Terraform best practices:
Module implementation tricks
64. ● Two spaces for indentation
● Align values at equals
● Nested blocks below arguments
● Meta arguments go first
● Blocks are separated by one blank line
● Use the standard module structure
● You don't have to remember this: use
terraform fmt and/or tflint
# bad (don't do this)
variable "name" {}
variable "zone" {}
output "id" {value=local.result}
# better
variable "name" {
description = "VM name"
type = string
}
variable "zone" {
description = "VM zone"
type = string
default = "europe-west1-b"
}
output "vm_id" {
description = "VM id"
value = local.result
}
Terraform best practices:
Follow terraform code conventions
66. backend.tf
terraform {
backend "gcs" {
bucket = "tf-state-prod"
prefix = "terraform/state"
}
}
● Store state in a Cloud Storage bucket
● Enable Object Versioning on the state GCS
bucket
● Segregate state into stages
● Stage SA permissions only for corresponding
stage GCS bucket
● Feed values from previous stages using
variables. Using variables makes explicit any
requirements and allows Terraform to validate if
the values are provided.
● Never change state manually, use tf state rm /
tf import instead.
Terraform Considerations: State management
67. # non authoritative (one IAM identity)
resource "google_storage_bucket_iam_member" "member" {
bucket = "my-bucket-name"
role = "roles/storage.objectViewer"
member = "serviceAccount:foo@myprj.iam.gserviceaccount.com"
}
# authoritative for role
resource "google_storage_bucket_iam_binding" "binding" {
bucket = "my-bucket-name"
role = "roles/owner"
members = [ "user:jane@example.com" ]
}
# authoritative for resource (dangerous)
data "google_iam_policy" "foo-policy" {
binding {
role = "roles/storage.admin"
members = [ "group:yourgroup@example.com" ]
}
}
resource "google_storage_bucket_iam_policy" "member" {
bucket = "my-bucket-name"
policy_data = data.google_iam_policy.foo-policy.policy_data
}
IAM bindings are an integral part of any IaC
setup, and knowing the options provided by
the Google Cloud provider is important to
implement them properly and avoid conflicts.
The Google Cloud provider usually supports
bindings for different entities (org, project,
etc.) through three classes of IAM resources:
1. non authoritative
2. authoritative for a given role, and
3. authoritative for the resource.
You typically only want one approach in order
to avoid potential conflicts.
Terraform Considerations: IAM Bindings
68. For a given resource, an IAM policy is a set of bindings
of the form
(role, list of identities)
{
"bindings": [
{
"role": "roles/storage.admin"
"members": [
"user:alice@example.com",
"group:admins@example.com"
],
},
{
"role": "roles/storage.objectViewer"
"members": [
"user:bob@example.com"
],
}
],
"etag": "BwUjMhCsNvY=",
"version": 1
}
IAM Policy structure
69. {
"bindings": [
{
"role": "roles/storage.admin"
"members": [
"user:alice@example.com",
"group:admins@example.com"
],
},
{
"role": "roles/storage.objectViewer"
"members": [
"user:bob@example.com"
],
}
],
"etag": "BwUjMhCsNvY=",
"version": 1
}
# authoritative for resource (dangerous)
data "google_iam_policy" "foo-policy" {
binding {
role = "roles/storage.admin"
members = [
"user:alice@example.com", "group:admins@example.com"
]
}
binding {
role = "roles/compute.admin"
members = ["user:bob@example.com"]
}
}
resource "google_storage_bucket_iam_policy" "policy" {
bucket = "my-bucket-name"
policy_data = data.google_iam_policy.foo-policy.policy_data
}
Authoritative for the whole IAM policy
70. # authoritative for role
resource "google_storage_bucket_iam_binding" "binding" {
bucket = "my-bucket-name"
role = "roles/storage.admin"
members = [
"user:jane@example.com",
"group:storage@example.com"
]
}
{
"bindings": [
{
"role": "roles/storage.admin"
"members": [
"user:jane@example.com",
"group:storage@example.com"
],
},
{
"role": "roles/storage.objectViewer"
"members": [
"user:bob@example.com"
],
}
],
"etag": "BwXCRbTTQKI=",
"version": 2
}
Authoritative for a single role
74. Internal or external IaC code
Always prefer internally-maintained code when in-house coding skills are present.
Terraform modules are a great way to encapsulate complexity and embed organizational requirements
and policies (like regionalization), while allowing less technical teams to profit from IaC.
Control
Internal modules allow
you to retain control
over critical parts of
your infrastructure
automation.
Support
Directly managing your
modules allows you to
react quicker to bugs or
provider changes.
Centralize
Centralize modules to
share best practices
across team, and
enforce policies.
Document
Lean code and few
abstraction layers turn
your IaC code into live
documentation.
75. Scope Who How What
Org setup cloud / infra team manual automation resources, initial org roles, audit logging
Hierarchy cloud / infra team CI/CD org-level hierarchy (folders, roles, shared projects)
Security security team CI/CD org-level security resources (sinks, KMS, CSCC, etc.)
Networking network team CI/CD shared networking resources (ICs, VPC hosts, etc.)
possibly leverage YAML/JSON for firewall and subnets
Modules cloud / infra team N/A central module repository
Projects (factory) cloud / infra team CI/CD
or portal
managed/automated provisioning of projects
possibly leverage YAML/JSON as data format
VMs (factory) cloud / infra team CI/CD
or portal
managed/automated provisioning of instances
possibly leverage YAML/JSON as data format
Partitioning IaC in stages
77. Create self-contained Terraform modules dedicated to
management of specific resources (projects, firewall rules, etc.)
● Embed organizational and security requirements to
enforce them at the IaC level
● Accept inputs in common descriptive languages (like
YAML) to allow non-coders to manage infrastructure
with code
● Plug in portals to offer auto-provisioning of specific
resources via IaC - GCP Private Catalog, ServiceNow,
etc
● Use for resources that are commonly deployed based on
day-to-day needs (a firewall rule, a new project, etc.)
Leverage IaC for non-technical teams or
interface to existing tools.
firewall/rules/ssh-rule.yaml
IaC factories
78. Managed vs. unmanaged
Terraform Enterprise
Pros
● Complies to security/location
requirements
● Full support
● Additional features
Cons
● Infrastructure and license costs
● Large operational overhead
Terraform Cloud
Pros
● Small operational overhead
● Fully supported
● Additional features
Cons
● License costs
● Remote state/execution might
not map to requirements
Terraform Open Source
Pros
● Complies to security/location
requirements
● No license costs
● Widely used, distributed kb
Cons
● Limited support
● Medium operational overhead
Terraform Open Source can be used to bootstrap, even if full support is needed later on
Running Terraform in production
79. Library Language What
Kitchen Terraform Ruby Non-trivial tooling and dependencies; uses the InSpec Google
provider to validate against created resources.
Terratest Go Leverages the standard Go testing framework; works as a wrapper
for the Terraform executable.
Tftest Python Leverages the standard Python unit testing framework; works as a
wrapper for the Terraform executable.
IaC lifecycle should follow the same best practices used for other types of production code
including testing, especially at the module level.
Testing plan output instead of creating actual resources is a valid minimally viable strategy to ensure
code correctness and compliance with provider changes.
Testing Terraform code
80. Tool Vendor What
Sentinel Hashicorp Built-in with Terraform Cloud and terraform Enterprise. Uses its own
policy language.
OPA Open Source De-facto standard for policy enforcement. Can process Terraform
plan outputs via custom integrations.
Terrascan Accurics Static code analyzer for Terraform. Verifies code complies with
policies before executing it.
Use policy as code to automatically enforce company-wide requirements with Terraform, to ensure code
correctness and compliance with provider changes.
Terraform policy enforcement