AWS and Terraform for Disaster Recovery

•Download as PPTX, PDF•

1 like•1,182 views

How LogicMonitor manages resources in AWS using Terraform to provide a reliable, repeatable way to both naturally grow our infrastructure and provide disaster recovery solutions.

Technology

Automating
Disaster Recovery
TechOps Adventures with Terraform

Terraform (verb)
Automating Disaster Recovery
“transform (a planet) so as to
resemble the earth,
especially so that it can
support human life.”

Automating Disaster Recovery
Terraform (noun)
OBJECTIVES: Automate, Automate, Automate

What is a Pod?
Automating Disaster Recovery
All of the components required
to provide LogicMonitor for customers
Tomcat
Kafka
TSDB
MySQL
Relay
Global Resources:
APIs
HAProxy
Redis
S3
SQS
ELBs
Sitemonitor
Proxy
SMTP
Render
ECSSG
DNS
… what’s next?
ElasticSearch
Rserve
IAM
Horizontally scalable Cell Architecture

• Runbook (Cookbook)
• CLI or web interface
• Co-workers .bash_history
• Crossing your fingers?
The Old Way
Automating Disaster Recovery

• Infrastructure as code (self documenting, repeatable)
• Provision and de-provision (important!)
• Scalable (change two parameters to create a new Pod)
Terraform
Automating Disaster Recovery

Terraform - change control
Automating Disaster Recovery

Terraform - preview changes
Automating Disaster Recovery

Disaster Strikes
Automating Disaster Recovery
Terraform Puppet Ansible

Questions?
Automating Disaster Recovery
Come find anyone wearing LogicMonitor shirts

What's hot

ERP Finance ModuleSean Badiru

Loading Data into RedshiftAmazon Web Services

AWS re:Invent 2016: Building a Solid Business Case for Cloud Migration (ENT308)Amazon Web Services

Use the SAP Content Server for Your Document Imaging and Archiving Needs!Verbella CMG

Speed up data preparation for ML pipelines on AWSData Science Milan

Best Practices for Oil & Gas: Rapid SAP Implementation at a Startup Independ...Kent Landrum

Dell boomi vs sap cpiGirish Bangalore

Next Gen Innovation: Enhancing your Contact Center with Amazon Connect for t...Amazon Web Services

AWS Partnership ModelAmazon Web Services

Visualization with Amazon QuickSightAmazon Web Services

Awsmahes3231

Amazon QuickSightAmazon Web Services

Requirements gathering-template-project manager-fd-cmdevansh37

Business objects data services in an sap landscapePradeep Ketoli

Enabling Corporate Failure Metrics at Nexen Energy with ISO 14224Tony Ciliberti PE

ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...Amazon Web Services

AWS Security FundamentalsAmazon Web Services

SAP Integrated Business PlanningAvi Shacham

Sales with sap crm 2007 overview presentationray jones

민첩하고 비용효율적인 Data Lake 구축 - 문종민 솔루션즈 아키텍트, AWSAmazon Web Services Korea

What's hot (20)

ERP Finance Module

Loading Data into Redshift

AWS re:Invent 2016: Building a Solid Business Case for Cloud Migration (ENT308)

Use the SAP Content Server for Your Document Imaging and Archiving Needs!

Speed up data preparation for ML pipelines on AWS

Best Practices for Oil & Gas: Rapid SAP Implementation at a Startup Independ...

Dell boomi vs sap cpi

Next Gen Innovation: Enhancing your Contact Center with Amazon Connect for t...

AWS Partnership Model

Visualization with Amazon QuickSight

Aws

Amazon QuickSight

Requirements gathering-template-project manager-fd-cm

Business objects data services in an sap landscape

Enabling Corporate Failure Metrics at Nexen Energy with ISO 14224

ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...

AWS Security Fundamentals

SAP Integrated Business Planning

Sales with sap crm 2007 overview presentation

민첩하고 비용효율적인 Data Lake 구축 - 문종민 솔루션즈 아키텍트, AWS

Similar to AWS and Terraform for Disaster Recovery

Don't Cross The Streams - Data Streaming And Apache FlinkJohn Gorman (BSc, CISSP)

London hug-samzahuguk

Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARNblueboxtraveler

The Cloud as a Platform - Cloud Connections 2011 Keynote - Jinesh VariaAmazon Web Services

Elegant Systems Integration w/ Apache CamelPradeep Elankumaran

Bostonrb Amazon TalkBrian Kaney

Stream Processing FrameworksSirKetchup

WebWorkersCamp 2010Olivier Gutknecht

Building Distributed Systems in ScalaAlex Payne

Cloud-Native Integration with Apache Camel on Kubernetes (Copenhagen October ...Claus Ibsen

Yahoo compares Storm and SparkChicago Hadoop Users Group

LAMP Stack (Reloaded) - Infrastructure as Code with Terraform & PackerJan-Christoph Küster

Reactive Summit 2017 Highlights!Fabio Tiriticco

Amazon web servicestsaiscorpio

Apache Camel v3, Camel K and Camel QuarkusClaus Ibsen

Flink Streaming Hadoop Summit San JoseKostas Tzoumas

Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Spark Summit

Cloud TalkJohn Willis

Scaling an invoicing SaaS from zero to over 350k customersSpeck&Tech

Clustering van IT-componentenRichard Claassens CIPPE

Similar to AWS and Terraform for Disaster Recovery (20)

Don't Cross The Streams - Data Streaming And Apache Flink

London hug-samza

Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN

The Cloud as a Platform - Cloud Connections 2011 Keynote - Jinesh Varia

Elegant Systems Integration w/ Apache Camel

Bostonrb Amazon Talk

Stream Processing Frameworks

WebWorkersCamp 2010

Building Distributed Systems in Scala

Cloud-Native Integration with Apache Camel on Kubernetes (Copenhagen October ...

Yahoo compares Storm and Spark

LAMP Stack (Reloaded) - Infrastructure as Code with Terraform & Packer

Reactive Summit 2017 Highlights!

Amazon web services

Apache Camel v3, Camel K and Camel Quarkus

Flink Streaming Hadoop Summit San Jose

Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...

Cloud Talk

Scaling an invoicing SaaS from zero to over 350k customers

Clustering van IT-componenten

Recently uploaded

GenCyber Cyber Security Day PresentationMichael W. Hawkins

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

A Domino Admins Adventures (Engage 2024)Gabriella Davis

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Real Time Object Detection Using Open CVKhem

Recently uploaded (20)

GenCyber Cyber Security Day Presentation

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Automating Google Workspace (GWS) & more with Apps Script

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Boost Fertility New Invention Ups Success Rates.pdf

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Breaking the Kubernetes Kill Chain: Host Path Mount

What Are The Drone Anti-jamming Systems Technology?

Data Cloud, More than a CDP by Matt Robison

Boost PC performance: How more available memory can improve productivity

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

CNv6 Instructor Chapter 6 Quality of Service

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

How to Troubleshoot Apps for the Modern Connected Worker

A Domino Admins Adventures (Engage 2024)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Presentation on how to chat with PDF using ChatGPT code interpreter

Real Time Object Detection Using Open CV

AWS and Terraform for Disaster Recovery

1. Automating Disaster Recovery TechOps Adventures with Terraform

2. Context Automating Disaster Recovery

3. Terraform (verb) Automating Disaster Recovery “transform (a planet) so as to resemble the earth, especially so that it can support human life.”

4. Automating Disaster Recovery Terraform (noun) OBJECTIVES: Automate, Automate, Automate

5. What is a Pod? Automating Disaster Recovery All of the components required to provide LogicMonitor for customers Tomcat Kafka TSDB MySQL Relay Global Resources: APIs HAProxy Redis S3 SQS ELBs Sitemonitor Proxy SMTP Render ECSSG DNS … what’s next? ElasticSearch Rserve IAM Horizontally scalable Cell Architecture

6. Conflict Automating Disaster Recovery

7. • Runbook (Cookbook) • CLI or web interface • Co-workers .bash_history • Crossing your fingers? The Old Way Automating Disaster Recovery

8. • Infrastructure as code (self documenting, repeatable) • Provision and de-provision (important!) • Scalable (change two parameters to create a new Pod) Terraform Automating Disaster Recovery

9. Terraform - change control Automating Disaster Recovery

10. Terraform - preview changes Automating Disaster Recovery

11. AMIs Automating Disaster Recovery

12. Automating Disaster Recovery

13. Disaster Strikes Automating Disaster Recovery Terraform Puppet Ansible

14. Questions? Automating Disaster Recovery Come find anyone wearing LogicMonitor shirts

15. C

Editor's Notes

Hello. I’m Randall Thomson, Sr. TechOps Engineer at LogicMonitor. Our TechOps team manages the infrastructure that provides LogicMonitor service for our customers. We straddle the line of SRE or DevOps, whatever you want to call it nowadays. We are always juggling our time between re-active and pro-active tasks. This talk is about what our team has done to provide automation in disaster recovery situations using Terraform and AWS.
I tend to jump right into the nitty gritty so I want to spend a brief couple slides going over the two main subjects to talk: Terraform & Pods (I will keep referring to these two things) Ask audience: Who has heard of or has experience with Terraform? Who has, or still does, provision AWS resources via the Web Portal? CLI? Other orchestration tools?
This is Elon Musk’s Disaster Recovery plan for Earth. Not what I will be talking about today but definitely something fun to Google afterwards.
Terraform - open source tool by Hashicorp (vagrant, packer, consul, vault) - will quote from website “Terraform enables you to safely and predictably create, change, and improve production infrastructure. It is an open source tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.”
Context part #2 - A Pod. LogicMonitor uses a Cell Architecture design we internally refer to as “Pods”. These, in addition to a handful of global resources are the infrastructure that powers the LogicMonitor service to our customers. Most of our Pods are a hybrid cloud model, where some of the resources are in our own datacenters with the rest being in AWS. The list on screen is only a subset (always changing), but as you can see there are a lot of resources that go into building a Pod. Lots of nuts & bolts.
So this leads us to one of our challenges. How do you provide a reliable way to scale and keep your disaster recovery plan up-to-date?
15m (Resolution) Open with the Old Way of creating infrastructure (cli, web interface) In the past if we wanted to replicate how an existing server was built we would have to lookup the documentation (if any) and then assess if any manual changes were made (cross your fingers, or read through co-worker’s .bash_history). Black magic. This led to inconsistencies for environments that should ideally be exactly the same.
Cue Terraform. The terraform code serves both as documentation of how infrastructure is built and a description of existing infrastructure. With Terraform you can both provision new infrastructure to be the same as old, as well as keep your older infrastructure up-to-date as you make changes along the way. Terraform is able to provision all of the resources which make up our pods except our bare-metal servers. Our DR plan utilizes a 100% AWS Cloud pod design with no data center dependencies. Scalable Worthwhile to maintain as it serves as the single source of truth. Documentation is always up-to-date. Turned processes we used to fear into near thoughtless tasks.
- HCL, Modules, Projects, and Directory Organization. Private vs public facing resources. Data Providers. Terraform projects and modules can or rather should be stored in a code repository (but not your state files) even in a single person shop. This enables you to have all the normal benefits of a software project but for your infrastructure. Revision history, proper change control. We use modules (reusable resource provisioners) as templates for our various application servers. We define projects to represent our various pods (and global resources). Each AWS environment has a distinct terraform code repository. Terraform can operation across multiple AWS environments but this gets complicated quick. Suggest: Make use of data providers so that you are not defining variables in your code. For example, looking up network ranges or AMI numbers.
5m - The ability to preview changes is useful both when creating new resources and especially important when modifying old resources. It’s like a diff output showing additions, subtractions and changes. Somewhat colorized. You can (and will) configure various resources to ignore certain types of changes over time for cases when you don’t need your older resources modified. For example, AMI numbers. You may change the AMI over time but you don’t need to re-provision older servers as Puppet keeps them up-to-date.
The Complication. I want to make a brief sidenote on AMI and the spectrum of Generic vs ready-to-run. We have about a dozen different types of application servers. For us it made sense to build a AMI that gets us about 95% of what we need and let Puppet do the final tweaks. For some it may be best to have a dozen different AMIs ready-to-run. The time savings can be dramatic when your instances don’t need a lot of post-configuration. It's another example of where you have to put a lot more work up-front to save time later. There are a variety of tools for building AMIs. We happen to use Packer, not because of any Terraform integration but simply that it does it’s one job very well. Also, make sure you copy your AMIs to any region where you may need to perform DR tasks.
At this point you may be wondering what all this has to do with Disaster Recovery. So here’s where we are today. We agreed as a group that any resource we provision in AWS must be done via terraform. All of our pods are described in terraform projects. As it so happened, in a serendipitous way, our Disaster Recovery plan was born. We no longer needed one way to provision our production infrastructure and a different method for our DR plan. With Terraform it’s basically the same in either case.
10m - The day has come. Your datacenter lost power. It’s 5am and you’ve been up half the night with your toddler. How much thinking do you want to have to do? How much thinking will you even be capable of? Likely very little. terraform plan; terraform apply. copy the project file and repeat. (hope your VPN works, and that you have AMIs in the target regions)
10m - We’re currently making use of terraform to manage our QA environments as well. There is always room for improvement. We are looking at ways to automate application deployment in DR situations. One example we’re testing is using IAM roles combined with EC2 user-data scripts to fetch our WAR files directly from S3. Another example would be having CI/CD tool (such as Bamboo) run the terraform commands. Then even your boss manager could do it.

AWS and Terraform for Disaster Recovery

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to AWS and Terraform for Disaster Recovery

Similar to AWS and Terraform for Disaster Recovery (20)

Recently uploaded

Recently uploaded (20)

AWS and Terraform for Disaster Recovery

Editor's Notes