2. What is our typical
infrastructure
Pets
Every server has a name and you
have a relationship
Change anything in-place
Automation is optional
Testing is critical; implicit
Susceptible to drift
Difficult to revert state
Slow to recover from disaster
8. Context
● Kayako is a helpdesk, like Zendesk and Freshdesk
● Tickets for servers at Softlayer
● Chef, SSH
● Many types of servers and components
● Fragile
9. Not a small toy
example - real
business, real servers
10. ● How do we provision a machine?
● How do we connect various parts?
● How do we release?
● How can we keep costs in control?
● Where have all the services gone?
● Who has access?
How do you do this
with immutability?
Key Questions
13. ● How do we provision a machine?
● How do we connect various parts?
● How do we release?
● How can we keep costs in control?
● Where have all the services gone?
● Who has access?
How do you do this
with immutability?
Key Questions Again
15. ● Moved to AWS
● Hardware, network and managed services provisioned with
Terraform
● Software provisioned with Packer
● Infrastructure as code
How do we provision a machine ?
16. How do we connect
various parts?
● Terraform describes the infrastructure as
code
● Integrates well with packer and AMIs
● Even works across cloud providers
● Modular
● Preview changes, apply safely
22. How do we release code?
● Release gets converted to images by Packer
● Terraform makes new servers with new code
● Consul+Nomad orchestrate to switch traffic (canary also supported)
● Terraform takes down old infrastructure
● Rollback is another release OR a nomad task if it was a canary
23. How can we keep costs in control?
● Nomad optimizes workload for utilization and efficiency
● Easily possible to create infrastructure only on demand (day, night)
● Docker compatible
24. ● Consul
○ Dynamic Service Discovery (DNS + HTTP)
○ Service Health
○ Runtime configuration and orchestration
○ Advanced networking
Where have all the services gone ?
25. Who has access?
● Tool chain
○ SREs
● SSH
○ No one
● Secret management and delivery
○ Vault
Change in place
Software and hardware
SSH
Configuration
Automation is optional
Gaps can be filled by hand
Testing is critical
Can’t be sure all non-automated parts are playing along
Test for functionality AND health
What is actually in production may not be explicitly tested
Susceptible to drift
Same type of server might have different software versions or hardware
Difficult to revert state
Also because last state is not actually well known
Recreate on change
No refactoring
Automation is compulsory
No SSH, manual not possible
Testing is optional, different
You don’t test for functionality, you test only for health
Can’t Drift
Chef, SSH
Optional automation
Mistyped commands etc
Types of servers
Load Balancer, Application Server, Worker Server, Elastic Search Cluster, Redis, MySQL servers, AMQP Servers, Service specific servers
Queues, Search, Caching, Databases, DNS, CDN