3. Definition
“A set of practices intended to reduce the time
between committing a change to a system
and the change being placed into normal production,
while ensuring high quality”
Bass, Len; Weber, Ingo; Zhu, Liming. DevOps: A
Software Architect's Perspective. ISBN
978-0134049847.
4. It’s not a new thing
already established in the industry - tons of job offerings confirm that
● automation, automation, automation
● containers eeeeverywheeereee
● The Cloud (i.e. someone else’s computer)
5. Simple test:
“are you DevOps level over 9000?”
● your answer for “how many servers do you have?” is “I have to check..”
● you do multiple production deployments each day
● your dev team can create new (micro)service along with all supporting
components without any ticket for ops team
● you can terminate any random instance in your infrastructure and the
environment will self-heal
● .. but let’s not even start with security related topics
6. DON’TS
How not to do “DevOps”
● Post a job description for
“DevOps Engineer” and hire a
few
● Put them on an “on-call”
● Push away developers from
directly interacting with the
environment
7. Effect?
Apart from low velocity and
quality you will get these:
● “Hey, can you send me logs
from my service?”
● “Heey, can you purge Redis for
me on staging?”
● “Heeey, I clicked deploy on
Jenkins and it’s stuck, HALP”
● “Heeeeeeeeeeeeeey….”
9. DO’S
● Do enable developers
● Streamline deployment
process
● Streamline infrastructure
management
● Guide, advise, discuss
● Hide complexity, but not too
much
● Treat yourself as a service
provider - deliver products not
tickets
11. It’s ok to hire
devops engineer
Brings experience and
specialized focus
● Communication skills are super
important here
● Tech requirements: good *nix
skills, good google skills and
sixth sense for sniffing bad
practices
● Probably the first person to
handle Security in your new
startup
12. Starting point
● Production environment: two servers,
dozen microservices
● Everything spinned up manually through
AWS Console
● Deployment meant ssh’ing to a server,
downloading new docker image,
stop+start (incurring downtime)
● Monitoring? Just cloudwatch logs
❌
● Spring Boot + Spring Cloud (Netflix)
● Dockerized, built on Jenkins
● Configured via environment variables
● Stateless
● Use of AWS
● Use of managed services
● Most important thing: competent
development team, eager to innovate 🚀
✅
13. 1. Kubernetes
Fixing error prone deployments
● batteries-included approach
● documentation
○ courses, FAQs, examples
● popular
● reasonably sane
○ apart from Milicores concepts and
a few others ;-)
● Lots of progress in the past ~2 years
○ stable
○ reliable
○ lots of know-how
○ lots of lessons learned
○ powerful CLI
14. Kubernetes
cont’d
● Helm
● Spinnaker
● Jenkins integrations
● Operators for complex
deployments
● Monitoring stack
● Cloud offerings (GKE, EKS,
Azure) tons of tools on top of it
Tons of tools on top of it
16. YAAAML 😱
● 200-400 lines of YAML to
describe a service..
● Secrets management..
● Even with Helm, deployment
is a complex command
● Tains, tolerations, affinity,
heap vs total memory,
exposing ports, scraping
metrics .. and keep it all
consistent across multitude
of services
● Tooling versioning
17. Re: hide complexity, but not too much
● Jenkins deployment job is nice and all, up until it stops working
● How can you expect proficiency with Kubernetes / kubectl if all developers
ever do is push a Run button?
● Enable them by making it easy to use CLI tools
○ Prepare Helm, helm-secrets, helm-diff, all along with binaries, configs and ./setup.sh script
for easy installation
○ Create one template for all services, supporting most common configuration
○ Add yet another abstraction layer for most common tasks
20. DevOps == Collaboration
● Example: monitor performance of all microservices
○ Example stack: Prometheus via Prometheus Operator
○ Add Service Monitor objects to each deployment
● New application<->platform contract emerged: just expose prometheus
metrics on port N and you will see your service graphs on Grafana
○ Developers responsible for adjusting their services to obey the new contract, make domain
specific dashboards
● Good tools helped here: Kubernetes made it easy to deploy the stack,
Spring framework made it easy to expose metrics
21. 2. Infrastructure
as Code
Terraform + Atlantis
● Git-versioned infrastructure
● Migrate/Move or import existing
resources
● Setup Atlantis for audited and
peer-reviewed infrastructure changes
● Use the same tools to detect state drift
(changes that were made outside of
atlantis flow)
● Optionally remove user permissions so
that changes must go through Pull
Requests
22. Terraform
Declarative infrastructure
management
● Define AWS resources
○ Readable syntax
○ Combine multiple resources into
reusable module
● Plan
○ Compare definition with current
state
○ Display detailed changeset
● Apply
○ Make changes to infrastructure
○ Record state
● Team-workflow supported
○ State in AWS S3
○ Locks in DynamoDB
23. Atlantis
Pull Requests for infrastructure
1. GitHub hook on each Pull
Request to terraform repo
2. Additional layer of locking so
no other PR can touch the
same parts of infrastructure
3. Autoplan: show plan preview in
PR comments
4. Review & Approve Pull Request
5. Apply changes
6. Remove locks and merge
24. Demo?
If time permits ;-)
If time won’t permit: shout out to
my friend Szymon W. who made a
nice blogpost about introducing
terraform and atlantis across whole
company:
https://lab.getbase.com/terraform-base/
26. From my own experience
Cosmose:
● One “devops engineer”, seven contributors
to terraform repo in a month, eleven now
● > 10 production deployments per day
● 3x more microservices since I joined (~6
months)
● Infrastructure autoscaled 10x one time,
when a dev wanted to “speed up his
processing task” ;-)
Base / Zendesk Sell:
● Around 8 Ops and 42 (!) contributors to
terraform repo
● 30-50 deployments to prod daily
● High level of ownership in dev teams,
including expertise in running databases
(e.g. ElasticSearch, MySQL), building their
own infrastructure stacks (QA Kubernetes)