Copyright © 2018 HashiCorp
Rein in Your Cloud
Costs with
Terraform and AWS
Lambda
A guide to reducing unnecessary AWS spending
Copyright © 2018 HashiCorp
• HashiCorp solutions engineer based in Austin, Texas
• First computer was a Commodore 64
• Recovering system administrator
• Enjoys building robots and automated things
• AWS Solutions Architect – Associate
• Likes backpacking and outdoor activities
• Github username: scarolan
Your Host - Sean Carolan
2
Sean getting as far away from
computers as possible.
Copyright © 2018 HashiCorp
• Who created all these resources?
• What sizes of ec2 instances are we running?
• When is it safe to shut down or delete resources?
• Where are all my instances running?
• Why are we leaving the water running 24/7?
• How are we going to fix this mess?
We spent HOW much on AWS this month?!?!
3
Copyright © 2018 HashiCorp
• Developers want to go fast, build and test new features.
• The provisioning process for dev environments is slow.
• Someone creates an AWS account for developers and QA testing.
• The AWS account quickly turns into “dev/qa romper room” with no
supervision or oversight.
• Instances, unused storage, and other resources begin to pile up.
• Amazon won’t stop you from creating more resources than you
need.
• You get a huge monthly bill and have to explain it to the CFO.
Rogue AWS Account – A DevOps Story
4
Copyright © 2018 HashiCorp
The AWS cost explorer can show you where you’re spending the
most money. Group by ‘Usage Type’ to show the stacked graph.
Know Where Your Money is Going
5
Copyright © 2018 HashiCorp
1. Shut off anything that’s not in
use.
2. Make sure your ec2 instances
are sized appropriately.
3. Purchase reserved instances for
your most common instance
sizes.
3 Ways to Reduce EC2 Costs
6
Copyright © 2018 HashiCorp
Goal: Create a simple serverless application to clean up unused and expired
AWS instances on a regular schedule.
Project Requirements:
• Must be an automated process
• Puts minimal burden on the end user
• Includes a way to whitelist resources
• Instances have a TTL or time-to-live after which they are terminated
• Able to deal with unidentified, or “orphaned” instances
• Clearly identifies who created each instance and how to contact them
• Provides visibility to the team via chat or email
Develop a Plan
7
Copyright © 2018 HashiCorp
AWS allows you to tag almost every resource with metadata. Tags can provide
valuable information about a resource.
Good tagging hygiene is essential to getting your AWS costs under control.
What are Tags?
8
Copyright © 2018 HashiCorp
AWS Lambda lets you run code without
provisioning or managing servers. You pay
only for the compute time you consume -
there is no charge when your code is not
running.
• Scales easily
• No servers to manage
• Billing by the millisecond
• Pay only for what you use
What is AWS Lambda?
9
Copyright © 2018 HashiCorp
What is Terraform?
10
• Terraform is an open source tool for building, changing, and
versioning infrastructure safely and efficiently.
• Use to provision any type of infrastructure.
• Instead of managing your cloud resources manually, you
write Terraform plans that describe your Infrastructure as
Code.
• Terraform works across multiple cloud providers and with
your on-premises resources.
• It’s easy to deploy cloud infrastructure and tools using
Terraform.
• Terraform Enterprise offers advanced features such as role
based access controls and state management for your
production workloads.
Copyright © 2018 HashiCorp
“If you make something,
your heart will go into the
thing you are making. So
a robot is an external self.
If a robot is an external
self, a robot is your child.”
–Masahiro Mori
Deploy a Cleanup Robot
11
Credit: http://baltimorewaterfront.com/healthy-harbor/water-wheel/
Copyright © 2018 HashiCorp
Instance Usage Reporting
12
Copyright © 2018 HashiCorp
Termination of Expired Instances
13
Copyright © 2018 HashiCorp
What Kind of Savings Can I Expect?
14
s
Copyright © 2018 HashiCorp 15
WARNING
The app we are about to launch can delete
AWS instances.
Do not deploy it to production unless you
are 100% you know what you are doing.
s
Copyright © 2018 HashiCorp 16
Live Demo
Copyright © 2018 HashiCorp
• Actively enforce your tagging standards using Terraform Enterprise and
Sentinel
• Require approvals to run larger, more expensive instance types
• Centralize your Terraform state management to reduce waste and orphaned
resources
• Track all infrastructure changes with audit logs
Next episode – Save even more money
17
Copyright © 2018 HashiCorp
Terraform AWS Lambda cleanup bot guide and source code:
http://bit.ly/2tQq9te
AWS Lambda and Slack tutorial
https://api.slack.com/tutorials/aws-lambda
Terraform Enterprise
https://www.hashicorp.com/products/terraform
Useful Links
18
Thank you.
hello@hashicorp.comwww.hashicorp.com

Rein in Your Cloud Costs with Terraform and AWS Lambda

  • 1.
    Copyright © 2018HashiCorp Rein in Your Cloud Costs with Terraform and AWS Lambda A guide to reducing unnecessary AWS spending
  • 2.
    Copyright © 2018HashiCorp • HashiCorp solutions engineer based in Austin, Texas • First computer was a Commodore 64 • Recovering system administrator • Enjoys building robots and automated things • AWS Solutions Architect – Associate • Likes backpacking and outdoor activities • Github username: scarolan Your Host - Sean Carolan 2 Sean getting as far away from computers as possible.
  • 3.
    Copyright © 2018HashiCorp • Who created all these resources? • What sizes of ec2 instances are we running? • When is it safe to shut down or delete resources? • Where are all my instances running? • Why are we leaving the water running 24/7? • How are we going to fix this mess? We spent HOW much on AWS this month?!?! 3
  • 4.
    Copyright © 2018HashiCorp • Developers want to go fast, build and test new features. • The provisioning process for dev environments is slow. • Someone creates an AWS account for developers and QA testing. • The AWS account quickly turns into “dev/qa romper room” with no supervision or oversight. • Instances, unused storage, and other resources begin to pile up. • Amazon won’t stop you from creating more resources than you need. • You get a huge monthly bill and have to explain it to the CFO. Rogue AWS Account – A DevOps Story 4
  • 5.
    Copyright © 2018HashiCorp The AWS cost explorer can show you where you’re spending the most money. Group by ‘Usage Type’ to show the stacked graph. Know Where Your Money is Going 5
  • 6.
    Copyright © 2018HashiCorp 1. Shut off anything that’s not in use. 2. Make sure your ec2 instances are sized appropriately. 3. Purchase reserved instances for your most common instance sizes. 3 Ways to Reduce EC2 Costs 6
  • 7.
    Copyright © 2018HashiCorp Goal: Create a simple serverless application to clean up unused and expired AWS instances on a regular schedule. Project Requirements: • Must be an automated process • Puts minimal burden on the end user • Includes a way to whitelist resources • Instances have a TTL or time-to-live after which they are terminated • Able to deal with unidentified, or “orphaned” instances • Clearly identifies who created each instance and how to contact them • Provides visibility to the team via chat or email Develop a Plan 7
  • 8.
    Copyright © 2018HashiCorp AWS allows you to tag almost every resource with metadata. Tags can provide valuable information about a resource. Good tagging hygiene is essential to getting your AWS costs under control. What are Tags? 8
  • 9.
    Copyright © 2018HashiCorp AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume - there is no charge when your code is not running. • Scales easily • No servers to manage • Billing by the millisecond • Pay only for what you use What is AWS Lambda? 9
  • 10.
    Copyright © 2018HashiCorp What is Terraform? 10 • Terraform is an open source tool for building, changing, and versioning infrastructure safely and efficiently. • Use to provision any type of infrastructure. • Instead of managing your cloud resources manually, you write Terraform plans that describe your Infrastructure as Code. • Terraform works across multiple cloud providers and with your on-premises resources. • It’s easy to deploy cloud infrastructure and tools using Terraform. • Terraform Enterprise offers advanced features such as role based access controls and state management for your production workloads.
  • 11.
    Copyright © 2018HashiCorp “If you make something, your heart will go into the thing you are making. So a robot is an external self. If a robot is an external self, a robot is your child.” –Masahiro Mori Deploy a Cleanup Robot 11 Credit: http://baltimorewaterfront.com/healthy-harbor/water-wheel/
  • 12.
    Copyright © 2018HashiCorp Instance Usage Reporting 12
  • 13.
    Copyright © 2018HashiCorp Termination of Expired Instances 13
  • 14.
    Copyright © 2018HashiCorp What Kind of Savings Can I Expect? 14
  • 15.
    s Copyright © 2018HashiCorp 15 WARNING The app we are about to launch can delete AWS instances. Do not deploy it to production unless you are 100% you know what you are doing.
  • 16.
    s Copyright © 2018HashiCorp 16 Live Demo
  • 17.
    Copyright © 2018HashiCorp • Actively enforce your tagging standards using Terraform Enterprise and Sentinel • Require approvals to run larger, more expensive instance types • Centralize your Terraform state management to reduce waste and orphaned resources • Track all infrastructure changes with audit logs Next episode – Save even more money 17
  • 18.
    Copyright © 2018HashiCorp Terraform AWS Lambda cleanup bot guide and source code: http://bit.ly/2tQq9te AWS Lambda and Slack tutorial https://api.slack.com/tutorials/aws-lambda Terraform Enterprise https://www.hashicorp.com/products/terraform Useful Links 18
  • 19.

Editor's Notes

  • #2 In today’s webinar we’ll show you how to deploy a cleanup bot in your AWS account to save money and reduce waste. This tool is open source and freely available via the HashiCorp guides git repository. It’s the same code our Solutions Engineering team uses internally to manage ec2 instance lifecycles.
  • #3 I’m a former systems administrator. In a past life I worked in the gaming industry. I’ve got many years experience caring for and feeding Linux servers.
  • #4 Does this look familiar to anyone? Can you answer the five Ws about your environment? Who created all this stuff? Unfortunately this can sometimes be really hard to track down. Instances can be created without any name tags, making them like orphaned pets wandering the street with no way to contact their owner. What sizes of instances are we running? Amazon prices their ec2 compute instances by size, which corresponds to the amount of CPU, disk and memory you get. One thing you may not be aware of is that each instance size is roughly *double* the compute capacity and cost of the size below it. So a micro instance costs twice as much as a nano, and a small costs four times as much as a nano, etc. Is it safe to shut any of this stuff down? Particularly in developer and QA, demo environments this can be a huge problem. If you don’t know whether an instance is running something critical, you may be afraid to shut it down or terminate it. Where are all my instances running? Remember that each AWS region is like a completely separate data center. It can be easy for instances to “hide out” in some far flung region of the world if you aren’t watching things closely. Why are we leaving the water on? Buying compute capacity nowadays is a lot like paying for electricity or water. As soon as you turn it on the meter starts running. And if you leave it running you have to pay for every single hour of usage. Perhaps the most important question here is “How are we going to fix this mess and get our costs under control?” Let’s take a closer look and look at some of the causes of cloud overspending.
  • #5 This is the story of our first cloud project at one of my previous jobs… The developers were working on a tight deadline to deliver the latest version of their application to production. Developers care about writing and testing their code, and fixing bugs. Sysadmins want to be careful and not break things. Stable and boring is good in operations. Dev and ops each have these competing priorities. This lead to a lot of frustration and missed deadlines. The dev teams just wanted a place to build and test their code. The team got permission to move some of their workloads into AWS. It’s not hard to set up an AWS account. All you need is a valid credit card and you can start paying for your compute, disk and storage by the hour. That’s exactly what our dev team did. And for a while things were great. The developers could provision their own virtual machines or instances in the cloud, and the operations folks had less to worry about. But things quickly got out of hand. I like to call this the ‘dev romper room’. Imagine you sent 25 kids into the candy store and told them to take whatever they needed. That’s what our AWS account looked like after a few weeks. People aren’t bad. They mean well. But humans aren’t perfect and they forget to shut things off. The more instances you run, the more money Amazon makes. They are not going to do a whole lot to help you, other than provide data and an API to work with. And the end result was that we got this monster AWS bill. Needless to say the CFO was not thrilled about it. Maybe you’ve experienced this as well? Let’s see how we can get a handle on our AWS spending, and reduce waste.
  • #6 This is a graph from the AWS cost explorer. We highly recommend you take some time to get familiar with Cost Explorer and the types of graphs it can create for you. This is the ‘stacked graph’ which breaks down the most expensive resources by type. One study I read about estimates that most users overspend on ec2 instances by 30-40%. That can be a staggering amount of money if you work in a larger shop. Now that we have a clearer idea where our AWS spending is happening, let’s look at a few ways to reduce costs. Since most of the wasted spending is on ec2 instances, we’ll focus on cutting costs there first. The techniques you learn in this webinar will also apply if you want to add automated bots to manage other things like NAT gateways or RDS instances.
  • #7 These are the top three things you can do to reduce ec2 spending right away. The first one should be obvious. I have a teenage daughter and I’m often wandering the house turning off lights that were left on. Ideally I should automate this with a motion sensor or some other method. Do you know why people don’t remember to shut things off? It’s not because they’re bad people. It’s because they aren’t paying the bill. And when you see that an instance is only 25 cents an hour – that doesn’t sound so bad. But when you say it costs $180 a month it seems much more significant. The second thing you can do is choose the optimum instance type for your workloads. If you are running on an m4.large, but an t2.medium would suffice, you can cut the cost of running that instance in half. For the third item, purchasing reserved instances, you will want to gather data and only after you’ve done your testing, you can go ahead and purchase reserved instances at a discount. Just know that you’ll be locked into the region where you purchased them so choose wisely.
  • #8 Ok, we’ve identified the problem. We have a messy AWS account that’s got a lot of untagged resources running in it. We’re going to assume the worst and build a robot to clean up the mess. Here are our project requirements. It has to be automated. Don’t depend on humans to always do the right thing. Because you’ll be disappointed. Make it easy for your users. Everyone who creates resources in the account needs to agree to the standards. Make it easy to exclude instances from your reaper bot. You may have long-running infra that you need to keep around. The TTL is the lynchpin of this system. It is an expression of the maximum time you think you’ll need a resource. It trains your users to think carefully about how much they need, and the bot will use this value to know what can be safely deleted. Assume your account doesn’t have any tagging at all. What happens to instances that didn’t get tagged or are improperly tagged? This is also critical. Who created this thing, and how do I reach them? Should notify the team about usage patterns, untagged instances, and functional actions.
  • #9 Tags are a way to add your own identifying information to AWS resources. You can show your tags in the AWS console as we’ve done here. How you tag things is really up to you, it’s like an empty dry erase board. You can tag your instances with anything you want, including emoji! If you haven’t done so already you should sit down with your team and create a tagging standard for your AWS instances. For our application we have two mandatory tags, owner and TTL. The owner is an email address for whoever is responsible for this instance, and the TTL or time to live represents the number of hours an instance should be allowed to exist.
  • #10 AWS Lambda is ‘serverless compute’. It allows you to run your code without having to run a Linux or Windows server. You can write AWS lambda functions in node.js, Python, Java, C# and Go. Our cleanup bot is written in Python. In the “old days” we would have created a linux server and some cron jobs to run our application, but now we don’t need to. If all I care about is having an environment to run my python scripts, AWS lambda is a perfect fit for our cleanup bot.
  • #11 Terraform is an open source tool for provisioning any kind of cloud or on-prem infrastructure. Traditionally you might think of Terraform being used to stand up your Linux and Windows VMs or cloud instances. But it can do so much more. There are hundreds of things you can create in AWS with Terraform, not just compute instances. Terraform can be used both in your data center and in the cloud It’s easy to learn and easy to use There’s an enterprise version that offers advanced features for groups and organizations.
  • #12 This is one of my favorite robots. His name is Mr. Trash Wheel and he lives in the Baltimore harbor. Mr. Trash Wheel has his own twitter account. Some of you might be old enough to remember Rosie the Robot from the Jetsons cartoon. Cleanup robots are no longer the stuff of science fiction. We now have actual robot vacuums that respond to voice commands. Amazon even just got a patent for robots that clean up after the other robots. Let that one sink in for a moment… Instead of picking up trash, our bot will clean up expired AWS instances and notify users via Slack or email The cleanup process requires cooperation from humans Have some fun deploying your cleanup bot.
  • #13 This is an example report sent into Slack chat by our bot, NEPTR. You can also send these reports via email. The bot provides two functions: Increase awareness and visibility Actively shut down and terminate instances that have passed their TTL or defined lifecycle policy
  • #14 Here’s another example report. You can send these via email as well. This one is showing a handful of expired instances being reaped.
  • #15 I can’t share actual numbers but this is a real graph from one of our accounts here at HashiCorp, and you can see the percentage drop in our daily spend. It really depends on your workloads and how much you have running but a 30-40% reduction in daily cost is not out of the ordinary. We saw over a 50% drop in our daily spending once we implemented this. The Start Nagging tag is where we began to politely ask users to shut off what they weren’t using. You can see a gradual decrease of daily cost as we began to shut down things that were not in use. The Wall of Shame is a daily report that gets sent to the team, showing a list of instances that have no tags. We can see a continued decline as users tried to get off the wall of shame. The final arrow shows the reaper being put into ACTIVE mode, which means it can shut down and/or terminate instances according to your policies. You can see a much steeper dropoff as a bunch of instances were put to sleep on that day. After you implement this bot you may start to see a more steady pattern as your daily costs stabilize at a new, lower level. Ready for a live demo? Let’s deploy some code!
  • #16 We highly recommend using Terraform Enterprise if you’re going to run this in production. Terraform enterprise includes safety checks and approvals that reduce risk and create an audit trail with detailed logs.
  • #18 This has been the first of a two part blog post and webinar. In the second part we’ll show you how Terraform Enterprise can help you save even more money while efficiently managing your cloud infrastructure.