Smart Platform
Infrastructure
How we are learning to let our team sleep at night
James Huston
DevOPS Days Charlotte
February 2017
whoami
• James Huston - Director of Platform Engineering @
Red Ventures
• Over the last 20 years I have been on teams that:
• Tried a lot of things, some worked, some didn’t
• Learned a lot of do’s and don’ts
The Team
Thomas Hopkins Ryan Ruscett
Alfonso Cabrera Garrett JohnsonMike Guthrie
So what do I have to share?
• Sleep
• Operations -vs- Platform Ops
• Infrastructure (AWS)
• Monitoring and Alerting
• Security
• Workflows
• Documentation
• Docker
Sleep
• Our jobs are 24/7/365
• Small teams
• Resource bound
• To be successful, We need sleep
Operations -vs- Platform Ops
• Deeper knowledge
• Correct -vs- Fast
• Snowflakes?
• Wide breadth of
knowledge
• Fast turn around, or
self service
• Automate all the things
Platform Ops
Platform enables developers to safely and
consistently perform their own operations and build
resilient and secure applications.
Infrastructure
• Traditional Operations - Healthy Infrastructure
• Linux in your datacenter
• Apps on top of that
• Platform Ops - Healthy Applications
• AWS/Azure/Google
• Managed services
• Apps on top of that
Monitoring and Alerting
• You are likely underestimating its importance
• Integrate them from the beginning, don’t bolt them
on.
• Make sure your alerts go to the correct people
• Don’t create alerts that you are going to ignore!
Infrastructure Layout
Staging Production
Our Infrastructure
Infrastructure - Why is it
Important
• Take advantage of Autoscaling for scale and auto
healing
• Design to be secure from the start
• Design with monitoring and alerting built in
• Build your infrastructure in a standard,
documented, reproducible way
Immutable Infrastructure
• First line of debugging: remove the machine and let
it get replaced
• Avoid snowflakes/unicorns as much as possible
• Replace for security reasons
• Easy to implement (in the cloud anyhow)
• Salt/Chef/Puppet - use it for initial config, don’t
push changes
Program and Automate
• Reproduce repeatable infrastructures
• Team review of changes before they are made
• Pull requests
• Easy Rollback
• Shareable and reusable modules
• https://github.com/segmentio/stack
Terraform
• Plays nice with Most of the Things
• Multiple cloud providers, VMware, OpenStack
• Grafana, DataDog, New Relic, PagerDuty,
Logentries
• MySQL, PostgreSQL
• Program all the things - Except Snowflakes
Terraform -vs- CloudFormation
• State
• Fast
• Admin Access
• No State
• Not so fast
• AWS Service Catalog
Security - SSO
• Don’t underestimate the power of the dark side OR
your need to use Single Sign On (SSO)
• Active Directory, LDAP, Okta for AWS/Apps
• JumpCloud or LDAP for EC2 instances
• Avoid tools that don’t support SSO (GitHub.com) in
favor of tools that do (GitHub Enterprise)
Security
• Don’t share SSH keys among your team(s). Ever.
• 0.0.0.0/0 on a security group that is not a public
ELB? That’s likely bad.
• eg. future VPN or DirectConnect
Developer Workflows
• Automation is key
• Use standard tooling (Makefile, shell scripts, etc)
• Bamboo -vs- Jenkins
• Centralization
• Provide guardrails and let teams with the expertise
control their own destiny
• Documentation of workflows is critically important
Documentation
• README.MD - keep docs with your projects
• Centralize infrastructure, CI/CD, and other core
docs
• Make it mandatory in governance
• Set a good example!
Docker
Security Info ala Jérôme Petazzoni (https://jpetazzo.github.io/)
http://bit.ly/1t1DG3Q
Docker
• Don’t run things as root
• Update often!
• For real security, run all filesystems read-only
• Use small (Alpine, Debian) base images
• Use only approved images
• Update them often
• Windows? All of the above.
Docker
• KISS - Keep It Simple Stupid!
Drumroll Please
The “Cloud” makes Platform Ops a reality. We can
now program and automate “all the things” and we
have the tools to make our infrastructure and
applications maintain and heal themselves …
And we get to sleep at night
411
James Huston
Director of Platform Engineering @ Red Ventures
james@jameshuston.net
@hustonjs

Smart Platform Infrastructure with AWS

  • 1.
    Smart Platform Infrastructure How weare learning to let our team sleep at night James Huston DevOPS Days Charlotte February 2017
  • 2.
    whoami • James Huston- Director of Platform Engineering @ Red Ventures • Over the last 20 years I have been on teams that: • Tried a lot of things, some worked, some didn’t • Learned a lot of do’s and don’ts
  • 3.
    The Team Thomas HopkinsRyan Ruscett Alfonso Cabrera Garrett JohnsonMike Guthrie
  • 4.
    So what doI have to share? • Sleep • Operations -vs- Platform Ops • Infrastructure (AWS) • Monitoring and Alerting • Security • Workflows • Documentation • Docker
  • 5.
    Sleep • Our jobsare 24/7/365 • Small teams • Resource bound • To be successful, We need sleep
  • 6.
    Operations -vs- PlatformOps • Deeper knowledge • Correct -vs- Fast • Snowflakes? • Wide breadth of knowledge • Fast turn around, or self service • Automate all the things
  • 7.
    Platform Ops Platform enablesdevelopers to safely and consistently perform their own operations and build resilient and secure applications.
  • 8.
    Infrastructure • Traditional Operations- Healthy Infrastructure • Linux in your datacenter • Apps on top of that • Platform Ops - Healthy Applications • AWS/Azure/Google • Managed services • Apps on top of that
  • 9.
    Monitoring and Alerting •You are likely underestimating its importance • Integrate them from the beginning, don’t bolt them on. • Make sure your alerts go to the correct people • Don’t create alerts that you are going to ignore!
  • 10.
  • 11.
  • 12.
    Infrastructure - Whyis it Important • Take advantage of Autoscaling for scale and auto healing • Design to be secure from the start • Design with monitoring and alerting built in • Build your infrastructure in a standard, documented, reproducible way
  • 13.
    Immutable Infrastructure • Firstline of debugging: remove the machine and let it get replaced • Avoid snowflakes/unicorns as much as possible • Replace for security reasons • Easy to implement (in the cloud anyhow) • Salt/Chef/Puppet - use it for initial config, don’t push changes
  • 14.
    Program and Automate •Reproduce repeatable infrastructures • Team review of changes before they are made • Pull requests • Easy Rollback • Shareable and reusable modules • https://github.com/segmentio/stack
  • 15.
    Terraform • Plays nicewith Most of the Things • Multiple cloud providers, VMware, OpenStack • Grafana, DataDog, New Relic, PagerDuty, Logentries • MySQL, PostgreSQL • Program all the things - Except Snowflakes
  • 16.
    Terraform -vs- CloudFormation •State • Fast • Admin Access • No State • Not so fast • AWS Service Catalog
  • 17.
    Security - SSO •Don’t underestimate the power of the dark side OR your need to use Single Sign On (SSO) • Active Directory, LDAP, Okta for AWS/Apps • JumpCloud or LDAP for EC2 instances • Avoid tools that don’t support SSO (GitHub.com) in favor of tools that do (GitHub Enterprise)
  • 18.
    Security • Don’t shareSSH keys among your team(s). Ever. • 0.0.0.0/0 on a security group that is not a public ELB? That’s likely bad. • eg. future VPN or DirectConnect
  • 19.
    Developer Workflows • Automationis key • Use standard tooling (Makefile, shell scripts, etc) • Bamboo -vs- Jenkins • Centralization • Provide guardrails and let teams with the expertise control their own destiny • Documentation of workflows is critically important
  • 20.
    Documentation • README.MD -keep docs with your projects • Centralize infrastructure, CI/CD, and other core docs • Make it mandatory in governance • Set a good example!
  • 21.
    Docker Security Info alaJérôme Petazzoni (https://jpetazzo.github.io/) http://bit.ly/1t1DG3Q
  • 22.
    Docker • Don’t runthings as root • Update often! • For real security, run all filesystems read-only • Use small (Alpine, Debian) base images • Use only approved images • Update them often • Windows? All of the above.
  • 23.
    Docker • KISS -Keep It Simple Stupid!
  • 24.
    Drumroll Please The “Cloud”makes Platform Ops a reality. We can now program and automate “all the things” and we have the tools to make our infrastructure and applications maintain and heal themselves … And we get to sleep at night
  • 25.
    411 James Huston Director ofPlatform Engineering @ Red Ventures james@jameshuston.net @hustonjs