SlideShare a Scribd company logo
A Year in Production with the Hashistack*
Well… most of it anyway
Introductions
@redmind
Jason Harley
jharley@streetcontxt.com
@redmind
https://www.linkedin.com/in/jharley/
https://github.com/jharley
Introductions:
Street Contxt
Agenda
• Initial state
• Packer
• Containerization and Terraform
• Discovery and Visibility and Consul
• Iterations and Improvements
• Reflection
@redmind
Street Contxt: Jan ‘17
• AWS as VMWare
 Five separate AWS accounts, running manually created resources
 A brief exploration into CloudFormation
• Ubuntu 14.04 based environment
 Some upgraded from 12.04 by hand
• Lovingly, hand-tended “pets”
• Distributed Monolith
 Play<>Wildfly<>PostgreSQL
 Apache Solr
 AWS ElasticMapReduce (EMR)
 A few containers running back-office stuff on “service” boxes
• Desire to move to Immutable Infrastructure
@redmind
Packer and Ansible
• Ansible was already in use at Street Contxt
 Reasonably complicated, single-purpose (use?) playbooks using a static inventory to
manage machines
 Developing a practice of writing reusable, immutable roles was needed
• Started carving out a “base” Ansible role with help from test-kitchen and
InSpec
@redmind
Mission: critical path containers?!
• Data Science team was developing two Tensorflow-based services, which
were nicely packaged as Docker containers and exposed an HTTP interface
• We needed a way to deploy, manage and route traffic to these service
 … on the cheap
• ECS looked like a decent candidate for a Proof of Concept (PoC)
 Rolling upgrades weren’t going to be an issue, really
 The price was right
 You only pay for EC2, *LB, and network traffic
@redmind
Introducing Terraform (March ’17)
• Starting from scratch current
 We had just closed-out a decent year.. we’re at our highest usage as a company
• Terraform 0.8.8 had just been released
• You don’t need to import everything(!)
 This seems to be the biggest initial failing point for Terraform adoption
 You might not need to import anything…
• For each of our environments, we setup static variables for the critical
ARNs/IDs of VPCs, subnets, Availability Zones, and Route53 zones
@redmind
@redmind
$ cat example.tfvars
env = ”exp”
vpc_id = "vpc-82e2471f”
availability_zones = ["us-east-1c", "us-east-1f"]
subnet_ids = {
public = ["subnet-c9b969a7", "subnet-1acb5a77"]
private = ["subnet-96b969f8", "subnet-77a5bca1"]
}
internal_domain_name = “exp.scx-internal.net”
internal_domain_id = “Y2KDOD09BQCVUN”
external_domain_name = "streetcontxt.com”
external_domain_id = “A6PGBR0HLIEJNB”
$ cat initialize.tf
provider "aws" {
region = “us-east-1”
}
variable "env" {
type = “string”
}
variable "vpc_id" {
type = “string”
}
variable "subnet_ids" {
type = "map”
default = {
public = []
private = []
}
}
[…]
Introducing Terraform (March ’17)
• We decided on an environment-based directory structure
 Heavily influenced by a Charity Majors blog post about separating your Terraform
into per-environment state
 Workspaces (nee Environments) didn’t yet exist
 Released in Terraform 0.9.0
• We decided to store and share state in versioned S3 buckets, with an
encryption policy configured
• To save us from ourselves… we wrote a Makefile
 Commands were starting to look potentially complicated
 Had to have the right environment variable set
 Had to pass the correct vars file
 Had to make sure you were using the right version of Terraform
 We make use of tfenv
 Notably, there is no make destroy
@redmind
Introducing Terraform (March ’17)
@redmind
$ make init
[…]
$ make plan
[…]
$ make apply
Introducing Terraform (March ’17)
@redmind
• Setup environments
• Built a module to create an ECS cluster with
all related policies and resources
• AMI created with Packer using an
Ansible role and built off the base image
• Didn’t terraform import anything…
Terraform and CloudFormation(?!)
• We hand bombed the pair of
Tensorflow services into existence
• Quickly realized we had a “fleet
management” issue
• Terraform doesn’t do rolling updates
 CloudFormation does
• Discovered an example from
AWSLabs using Lambda, SNS and
Cloudformation
• Simple enough to refactor the module
to setup a CFN stack to manage the
AutoscalingGroup
@redmind
Source: https://github.com/awslabs/ecs-cid-sample
Discovery and Visibility and Consul
• Visibility into the health and location of services
 Especially these new Tensorflow services in ECS
 talk of wanting an easy KV-store for a few projects
• Wrote a new Ansible role with the help of test-kitchen and InSpec
• Built an AMI with Packer and Ansible
• Wrote a Terraform consul_cluster module
 three node autoscaling group
 solved Consul bootstrapping via userdata and a Route53 record
 called it a “soft lock”
• Ansible role made use of EC2 tag-based discovery, configured dnsmasq to
redirect “*.consul” lookups to the Consul agent
• Successfully launched a cluster with a make apply(!)
 … and was quickly reminded we had zero Consul clients
@redmind
 Our Ansible role was written to
support clients and servers
 Quick and dirty script to add
the Consul client security group
to EC2 instances
 Rolled out the Consul agents
with ansible-playbook and
watched everyone report in
@redmind
Discovery and Visibility and Consul
Discovery and Visibility and Consul
• Now… we needed Dockerized-services to register themselves as Consul
Services
• Came across a great article from ZenDesk Engineering on using
Registrator from Gliderlabs
• Registrator automatically registers and deregisters services for any Docker
container by inspecting containers as they come online.
 SERVICE_NAME
 SERVICE_CHECK_HTTPS
 SERVICE_CHECK_INTERVAL
• Back into test-kitchen with our ECS role
 Added a registrator systemd unit that started with the Docker unit
@redmind
Discovery and Visibility and Consul
• Packer brought us a new AMI and with a make plan and make apply cycle
our registrator-enabled container hosts were in the wild
• Quickly added some ”SERVICE_” environment variables to our ECS Task
definitions, and updated the ECS Service to see registration of services into
Consul
@redmind
Consul in the critical path: ElasticSearch
• Up until now, Consul was telling us things... and while the data was useful
the conversation was fairly one sided
• We had a single instance Apache Solr service that needed to become more
critical.. and we decided that SolrCloud wasn't for us
• With our past success with Consul and ECS we dove back into test-kitchen
with a new Ansible role
• With a role we trusted, we could build an AMI, and then a Terraform module
• Instead of using a load balancer (which, many folks seem to use with ES)
Consul service discovery via DNS became the norm
@redmind
@redmind
• ElasticSearch clients began accessing
ElasticSearch as
“elasticsearch.service.consul”
• May 23, 2017 Consul became part of
the critical customer path for all
searches
Six months in…
@redmind
Ansible
Role
Packer
Image
Terraform
Module
Six months in…
• In that time, more containerized services have been written and are ready to
head out the door
• Two new modules to round out our growing library
 ecs_task
 ecs_service
• Quickly went from the initial 2 Tensorflow services
• 19 services and 4 batch-style tasks today
• More interestingly: these new services are being written using AWS services
• S3, SQS, Kinesis, KMS, Lambda
• We’d shaken off the inertia of the distributed monolith!
• We converted our Ansible inventory to a dynamic inventory driven by Terraform-
managed EC2 Tags in late July
@redmind
The latter half
• Started to bring parts of the legacy systems into Terraform and Consul
 Backend admin tool containerized and migrated to ECS in late June
 Elastic MapReduce (EMR) taken under Terraform control in August
 Our Wildfly cluster was turned into a Terraform module in November
 We actually imported things here 
 Our frontend UI moved to ECS in early January ‘18
@redmind
Reflection: Imports?
• We still don’t have everything managed by Terraform
• Our “legacy resource” variables for VPC and subnets are still variables
 No great urgency or business need to deal with importing them
 Data Providers make this a non-issue
 We’ll likely go “full Terraform” by the end of 2018
@redmind
Reflection: Outages? Uh-ohs?
• We’ve been really fortunate
 diligent about running plans and paying attention to the output
• We did lose search one day…
 Consul agents didn’t startup on the ElasticSearch instances after a maintenance
script ran
 Not Consul’s fault: operator error
@redmind
Reflection: Do overs?
• Wish I’d known about Molecule sooner
 We’ve yet to move a bunch of test-kitchen projects to it
• Our ecs_task module’s definition of the container’s environment is brittle
 We have plans in the works to move to envconsul as an entrypoint to address this
issue
• We love the our environment model (it makes us feel safe :D)... but a single
statefile per-environment is starting to get slow
 Plan to breakup this state by the fall
• We wish we already had dynamic secrets...
 currently doing some hacky magic with encrypted S3 objects and KMS to deal with
getting secrets into containers
@redmind
Questions?
@redmind
Jason Harley
jharley@streetcontxt.com
@redmind
https://www.linkedin.com/in/jharley/
https://github.com/jharley
Addendum: external links
• Building Immutable Machine Images with Packer and Ansible
 https://www.slideshare.net/JasonHarley3/building-immutable-machine-images-with-
packer-and-ansible/
• charity.wtf: TERRAFORM, VPC, AND WHY YOU WANT A TFSTATE FILE
PER ENV
 https://charity.wtf/2016/03/30/terraform-vpc-and-why-you-want-a-tfstate-file-per-
env/
• tfenv: Terraform version manager inspired by rbenv
 https://github.com/kamatama41/tfenv
• AWS Samples: ECS Container draining
 https://github.com/awslabs/ecs-cid-sample
@redmind

More Related Content

What's hot

Extending ansible
Extending ansibleExtending ansible
Extending ansible
Yan Kurniawan
 
React.js and Flux in details
React.js and Flux in detailsReact.js and Flux in details
React.js and Flux in details
Artyom Trityak
 
RESTful Api practices Rails 3
RESTful Api practices Rails 3RESTful Api practices Rails 3
RESTful Api practices Rails 3Anton Narusberg
 
Orleans – a “cloud native” runtime built for #azure
Orleans – a “cloud native” runtime built for #azureOrleans – a “cloud native” runtime built for #azure
Orleans – a “cloud native” runtime built for #azureBrisebois
 
Actors Set the Stage for Project Orleans
Actors Set the Stage for Project OrleansActors Set the Stage for Project Orleans
Actors Set the Stage for Project Orleans
cjmyers
 
RubyConf Taiwan 2016 - Large scale Rails applications
RubyConf Taiwan 2016 - Large scale Rails applicationsRubyConf Taiwan 2016 - Large scale Rails applications
RubyConf Taiwan 2016 - Large scale Rails applications
Florian Dutey
 
Functional Programming in Clojure
Functional Programming in ClojureFunctional Programming in Clojure
Functional Programming in Clojure
Troy Miles
 
Scala Matsuri 2017
Scala Matsuri 2017Scala Matsuri 2017
Scala Matsuri 2017
Yoshitaka Fujii
 
Flexible UI Components for a Multi-Framework World
Flexible UI Components for a Multi-Framework WorldFlexible UI Components for a Multi-Framework World
Flexible UI Components for a Multi-Framework World
Kevin Ball
 
Web Development using Ruby on Rails
Web Development using Ruby on RailsWeb Development using Ruby on Rails
Web Development using Ruby on RailsAvi Kedar
 
Container Orchestration for .NET Developers
Container Orchestration for .NET DevelopersContainer Orchestration for .NET Developers
Container Orchestration for .NET Developers
Mike Melusky
 
The Dark Side of Single Page Applications
The Dark Side of Single Page ApplicationsThe Dark Side of Single Page Applications
The Dark Side of Single Page Applications
Dor Kalev
 
Ruby performance - The low hanging fruit
Ruby performance - The low hanging fruitRuby performance - The low hanging fruit
Ruby performance - The low hanging fruit
Bruce Werdschinski
 
Microservices with Apache Camel, Docker and Fabric8 v2
Microservices with Apache Camel, Docker and Fabric8 v2Microservices with Apache Camel, Docker and Fabric8 v2
Microservices with Apache Camel, Docker and Fabric8 v2
Christian Posta
 
React + Redux for Web Developers
React + Redux for Web DevelopersReact + Redux for Web Developers
React + Redux for Web Developers
Jamal Sinclair O'Garro
 
How NOT to get lost in the current JavaScript landscape
How NOT to get lost in the current JavaScript landscapeHow NOT to get lost in the current JavaScript landscape
How NOT to get lost in the current JavaScript landscape
Radosław Scheibinger
 
Saving Time By Testing With Jest
Saving Time By Testing With JestSaving Time By Testing With Jest
Saving Time By Testing With Jest
Ben McCormick
 
How Shopify Scales Rails
How Shopify Scales RailsHow Shopify Scales Rails
How Shopify Scales Railsjduff
 
Taking Micronaut out for a spin
Taking Micronaut out for a spinTaking Micronaut out for a spin
Taking Micronaut out for a spin
Andres Almiray
 

What's hot (20)

Extending ansible
Extending ansibleExtending ansible
Extending ansible
 
SOA on Rails
SOA on RailsSOA on Rails
SOA on Rails
 
React.js and Flux in details
React.js and Flux in detailsReact.js and Flux in details
React.js and Flux in details
 
RESTful Api practices Rails 3
RESTful Api practices Rails 3RESTful Api practices Rails 3
RESTful Api practices Rails 3
 
Orleans – a “cloud native” runtime built for #azure
Orleans – a “cloud native” runtime built for #azureOrleans – a “cloud native” runtime built for #azure
Orleans – a “cloud native” runtime built for #azure
 
Actors Set the Stage for Project Orleans
Actors Set the Stage for Project OrleansActors Set the Stage for Project Orleans
Actors Set the Stage for Project Orleans
 
RubyConf Taiwan 2016 - Large scale Rails applications
RubyConf Taiwan 2016 - Large scale Rails applicationsRubyConf Taiwan 2016 - Large scale Rails applications
RubyConf Taiwan 2016 - Large scale Rails applications
 
Functional Programming in Clojure
Functional Programming in ClojureFunctional Programming in Clojure
Functional Programming in Clojure
 
Scala Matsuri 2017
Scala Matsuri 2017Scala Matsuri 2017
Scala Matsuri 2017
 
Flexible UI Components for a Multi-Framework World
Flexible UI Components for a Multi-Framework WorldFlexible UI Components for a Multi-Framework World
Flexible UI Components for a Multi-Framework World
 
Web Development using Ruby on Rails
Web Development using Ruby on RailsWeb Development using Ruby on Rails
Web Development using Ruby on Rails
 
Container Orchestration for .NET Developers
Container Orchestration for .NET DevelopersContainer Orchestration for .NET Developers
Container Orchestration for .NET Developers
 
The Dark Side of Single Page Applications
The Dark Side of Single Page ApplicationsThe Dark Side of Single Page Applications
The Dark Side of Single Page Applications
 
Ruby performance - The low hanging fruit
Ruby performance - The low hanging fruitRuby performance - The low hanging fruit
Ruby performance - The low hanging fruit
 
Microservices with Apache Camel, Docker and Fabric8 v2
Microservices with Apache Camel, Docker and Fabric8 v2Microservices with Apache Camel, Docker and Fabric8 v2
Microservices with Apache Camel, Docker and Fabric8 v2
 
React + Redux for Web Developers
React + Redux for Web DevelopersReact + Redux for Web Developers
React + Redux for Web Developers
 
How NOT to get lost in the current JavaScript landscape
How NOT to get lost in the current JavaScript landscapeHow NOT to get lost in the current JavaScript landscape
How NOT to get lost in the current JavaScript landscape
 
Saving Time By Testing With Jest
Saving Time By Testing With JestSaving Time By Testing With Jest
Saving Time By Testing With Jest
 
How Shopify Scales Rails
How Shopify Scales RailsHow Shopify Scales Rails
How Shopify Scales Rails
 
Taking Micronaut out for a spin
Taking Micronaut out for a spinTaking Micronaut out for a spin
Taking Micronaut out for a spin
 

Similar to A year in Production with the Hashistack

(DVO308) Docker & ECS in Production: How We Migrated Our Infrastructure from ...
(DVO308) Docker & ECS in Production: How We Migrated Our Infrastructure from ...(DVO308) Docker & ECS in Production: How We Migrated Our Infrastructure from ...
(DVO308) Docker & ECS in Production: How We Migrated Our Infrastructure from ...
Amazon Web Services
 
Containers, Serverless and Functions in a nutshell
Containers, Serverless and Functions in a nutshellContainers, Serverless and Functions in a nutshell
Containers, Serverless and Functions in a nutshell
Eugene Fedorenko
 
Kubernetes and Terraform in the Cloud: How RightScale Does DevOps
Kubernetes and Terraform in the Cloud: How RightScale Does DevOpsKubernetes and Terraform in the Cloud: How RightScale Does DevOps
Kubernetes and Terraform in the Cloud: How RightScale Does DevOps
RightScale
 
Новый InterSystems: open-source, митапы, хакатоны
Новый InterSystems: open-source, митапы, хакатоныНовый InterSystems: open-source, митапы, хакатоны
Новый InterSystems: open-source, митапы, хакатоны
Timur Safin
 
Building a PaaS with Docker and AWS
Building a PaaS with Docker and AWSBuilding a PaaS with Docker and AWS
Building a PaaS with Docker and AWS
Amazon Web Services
 
Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2
Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2
Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2
Amazon Web Services
 
NetflixOSS for Triangle Devops Oct 2013
NetflixOSS for Triangle Devops Oct 2013NetflixOSS for Triangle Devops Oct 2013
NetflixOSS for Triangle Devops Oct 2013
aspyker
 
Containers at AWS: State of the Union
Containers at AWS: State of the Union  Containers at AWS: State of the Union
Containers at AWS: State of the Union
Massimo Ferre'
 
Rackspace::Solve NYC - The Future of Applications with Ken Cochrane, Engineer...
Rackspace::Solve NYC - The Future of Applications with Ken Cochrane, Engineer...Rackspace::Solve NYC - The Future of Applications with Ken Cochrane, Engineer...
Rackspace::Solve NYC - The Future of Applications with Ken Cochrane, Engineer...
Rackspace
 
Containerize all the things!
Containerize all the things!Containerize all the things!
Containerize all the things!
Mike Melusky
 
Euro ht condor_alahiff
Euro ht condor_alahiffEuro ht condor_alahiff
Euro ht condor_alahiff
vandersantiago
 
Building a PaaS with Docker and AWS
Building a PaaS with Docker and AWSBuilding a PaaS with Docker and AWS
Building a PaaS with Docker and AWS
vesirin
 
Evolution of a cloud start up: From C# to Node.js
Evolution of a cloud start up: From C# to Node.jsEvolution of a cloud start up: From C# to Node.js
Evolution of a cloud start up: From C# to Node.js
Steve Jamieson
 
Clocker: Managing Container Networking and Placement
Clocker: Managing Container Networking and PlacementClocker: Managing Container Networking and Placement
Clocker: Managing Container Networking and Placement
Docker, Inc.
 
The challenge of application distribution - Introduction to Docker (2014 dec ...
The challenge of application distribution - Introduction to Docker (2014 dec ...The challenge of application distribution - Introduction to Docker (2014 dec ...
The challenge of application distribution - Introduction to Docker (2014 dec ...
Sébastien Portebois
 
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Adrian Cockcroft
 
Clocker - The Docker Cloud Maker
Clocker - The Docker Cloud MakerClocker - The Docker Cloud Maker
Clocker - The Docker Cloud Maker
Andrew Kennedy
 
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
Radhika Puthiyetath
 
Choosing PaaS: Cisco and Open Source Options: an overview
Choosing PaaS:  Cisco and Open Source Options: an overviewChoosing PaaS:  Cisco and Open Source Options: an overview
Choosing PaaS: Cisco and Open Source Options: an overview
Cisco DevNet
 
Tech connect aws
Tech connect  awsTech connect  aws
Tech connect aws
Blake Diers
 

Similar to A year in Production with the Hashistack (20)

(DVO308) Docker & ECS in Production: How We Migrated Our Infrastructure from ...
(DVO308) Docker & ECS in Production: How We Migrated Our Infrastructure from ...(DVO308) Docker & ECS in Production: How We Migrated Our Infrastructure from ...
(DVO308) Docker & ECS in Production: How We Migrated Our Infrastructure from ...
 
Containers, Serverless and Functions in a nutshell
Containers, Serverless and Functions in a nutshellContainers, Serverless and Functions in a nutshell
Containers, Serverless and Functions in a nutshell
 
Kubernetes and Terraform in the Cloud: How RightScale Does DevOps
Kubernetes and Terraform in the Cloud: How RightScale Does DevOpsKubernetes and Terraform in the Cloud: How RightScale Does DevOps
Kubernetes and Terraform in the Cloud: How RightScale Does DevOps
 
Новый InterSystems: open-source, митапы, хакатоны
Новый InterSystems: open-source, митапы, хакатоныНовый InterSystems: open-source, митапы, хакатоны
Новый InterSystems: open-source, митапы, хакатоны
 
Building a PaaS with Docker and AWS
Building a PaaS with Docker and AWSBuilding a PaaS with Docker and AWS
Building a PaaS with Docker and AWS
 
Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2
Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2
Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2
 
NetflixOSS for Triangle Devops Oct 2013
NetflixOSS for Triangle Devops Oct 2013NetflixOSS for Triangle Devops Oct 2013
NetflixOSS for Triangle Devops Oct 2013
 
Containers at AWS: State of the Union
Containers at AWS: State of the Union  Containers at AWS: State of the Union
Containers at AWS: State of the Union
 
Rackspace::Solve NYC - The Future of Applications with Ken Cochrane, Engineer...
Rackspace::Solve NYC - The Future of Applications with Ken Cochrane, Engineer...Rackspace::Solve NYC - The Future of Applications with Ken Cochrane, Engineer...
Rackspace::Solve NYC - The Future of Applications with Ken Cochrane, Engineer...
 
Containerize all the things!
Containerize all the things!Containerize all the things!
Containerize all the things!
 
Euro ht condor_alahiff
Euro ht condor_alahiffEuro ht condor_alahiff
Euro ht condor_alahiff
 
Building a PaaS with Docker and AWS
Building a PaaS with Docker and AWSBuilding a PaaS with Docker and AWS
Building a PaaS with Docker and AWS
 
Evolution of a cloud start up: From C# to Node.js
Evolution of a cloud start up: From C# to Node.jsEvolution of a cloud start up: From C# to Node.js
Evolution of a cloud start up: From C# to Node.js
 
Clocker: Managing Container Networking and Placement
Clocker: Managing Container Networking and PlacementClocker: Managing Container Networking and Placement
Clocker: Managing Container Networking and Placement
 
The challenge of application distribution - Introduction to Docker (2014 dec ...
The challenge of application distribution - Introduction to Docker (2014 dec ...The challenge of application distribution - Introduction to Docker (2014 dec ...
The challenge of application distribution - Introduction to Docker (2014 dec ...
 
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
 
Clocker - The Docker Cloud Maker
Clocker - The Docker Cloud MakerClocker - The Docker Cloud Maker
Clocker - The Docker Cloud Maker
 
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
 
Choosing PaaS: Cisco and Open Source Options: an overview
Choosing PaaS:  Cisco and Open Source Options: an overviewChoosing PaaS:  Cisco and Open Source Options: an overview
Choosing PaaS: Cisco and Open Source Options: an overview
 
Tech connect aws
Tech connect  awsTech connect  aws
Tech connect aws
 

Recently uploaded

Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 

Recently uploaded (20)

Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 

A year in Production with the Hashistack

  • 1. A Year in Production with the Hashistack* Well… most of it anyway
  • 4. Agenda • Initial state • Packer • Containerization and Terraform • Discovery and Visibility and Consul • Iterations and Improvements • Reflection @redmind
  • 5. Street Contxt: Jan ‘17 • AWS as VMWare  Five separate AWS accounts, running manually created resources  A brief exploration into CloudFormation • Ubuntu 14.04 based environment  Some upgraded from 12.04 by hand • Lovingly, hand-tended “pets” • Distributed Monolith  Play<>Wildfly<>PostgreSQL  Apache Solr  AWS ElasticMapReduce (EMR)  A few containers running back-office stuff on “service” boxes • Desire to move to Immutable Infrastructure @redmind
  • 6. Packer and Ansible • Ansible was already in use at Street Contxt  Reasonably complicated, single-purpose (use?) playbooks using a static inventory to manage machines  Developing a practice of writing reusable, immutable roles was needed • Started carving out a “base” Ansible role with help from test-kitchen and InSpec @redmind
  • 7. Mission: critical path containers?! • Data Science team was developing two Tensorflow-based services, which were nicely packaged as Docker containers and exposed an HTTP interface • We needed a way to deploy, manage and route traffic to these service  … on the cheap • ECS looked like a decent candidate for a Proof of Concept (PoC)  Rolling upgrades weren’t going to be an issue, really  The price was right  You only pay for EC2, *LB, and network traffic @redmind
  • 8. Introducing Terraform (March ’17) • Starting from scratch current  We had just closed-out a decent year.. we’re at our highest usage as a company • Terraform 0.8.8 had just been released • You don’t need to import everything(!)  This seems to be the biggest initial failing point for Terraform adoption  You might not need to import anything… • For each of our environments, we setup static variables for the critical ARNs/IDs of VPCs, subnets, Availability Zones, and Route53 zones @redmind
  • 9. @redmind $ cat example.tfvars env = ”exp” vpc_id = "vpc-82e2471f” availability_zones = ["us-east-1c", "us-east-1f"] subnet_ids = { public = ["subnet-c9b969a7", "subnet-1acb5a77"] private = ["subnet-96b969f8", "subnet-77a5bca1"] } internal_domain_name = “exp.scx-internal.net” internal_domain_id = “Y2KDOD09BQCVUN” external_domain_name = "streetcontxt.com” external_domain_id = “A6PGBR0HLIEJNB” $ cat initialize.tf provider "aws" { region = “us-east-1” } variable "env" { type = “string” } variable "vpc_id" { type = “string” } variable "subnet_ids" { type = "map” default = { public = [] private = [] } } […]
  • 10. Introducing Terraform (March ’17) • We decided on an environment-based directory structure  Heavily influenced by a Charity Majors blog post about separating your Terraform into per-environment state  Workspaces (nee Environments) didn’t yet exist  Released in Terraform 0.9.0 • We decided to store and share state in versioned S3 buckets, with an encryption policy configured • To save us from ourselves… we wrote a Makefile  Commands were starting to look potentially complicated  Had to have the right environment variable set  Had to pass the correct vars file  Had to make sure you were using the right version of Terraform  We make use of tfenv  Notably, there is no make destroy @redmind
  • 11. Introducing Terraform (March ’17) @redmind $ make init […] $ make plan […] $ make apply
  • 12. Introducing Terraform (March ’17) @redmind • Setup environments • Built a module to create an ECS cluster with all related policies and resources • AMI created with Packer using an Ansible role and built off the base image • Didn’t terraform import anything…
  • 13. Terraform and CloudFormation(?!) • We hand bombed the pair of Tensorflow services into existence • Quickly realized we had a “fleet management” issue • Terraform doesn’t do rolling updates  CloudFormation does • Discovered an example from AWSLabs using Lambda, SNS and Cloudformation • Simple enough to refactor the module to setup a CFN stack to manage the AutoscalingGroup @redmind Source: https://github.com/awslabs/ecs-cid-sample
  • 14. Discovery and Visibility and Consul • Visibility into the health and location of services  Especially these new Tensorflow services in ECS  talk of wanting an easy KV-store for a few projects • Wrote a new Ansible role with the help of test-kitchen and InSpec • Built an AMI with Packer and Ansible • Wrote a Terraform consul_cluster module  three node autoscaling group  solved Consul bootstrapping via userdata and a Route53 record  called it a “soft lock” • Ansible role made use of EC2 tag-based discovery, configured dnsmasq to redirect “*.consul” lookups to the Consul agent • Successfully launched a cluster with a make apply(!)  … and was quickly reminded we had zero Consul clients @redmind
  • 15.  Our Ansible role was written to support clients and servers  Quick and dirty script to add the Consul client security group to EC2 instances  Rolled out the Consul agents with ansible-playbook and watched everyone report in @redmind Discovery and Visibility and Consul
  • 16. Discovery and Visibility and Consul • Now… we needed Dockerized-services to register themselves as Consul Services • Came across a great article from ZenDesk Engineering on using Registrator from Gliderlabs • Registrator automatically registers and deregisters services for any Docker container by inspecting containers as they come online.  SERVICE_NAME  SERVICE_CHECK_HTTPS  SERVICE_CHECK_INTERVAL • Back into test-kitchen with our ECS role  Added a registrator systemd unit that started with the Docker unit @redmind
  • 17. Discovery and Visibility and Consul • Packer brought us a new AMI and with a make plan and make apply cycle our registrator-enabled container hosts were in the wild • Quickly added some ”SERVICE_” environment variables to our ECS Task definitions, and updated the ECS Service to see registration of services into Consul @redmind
  • 18. Consul in the critical path: ElasticSearch • Up until now, Consul was telling us things... and while the data was useful the conversation was fairly one sided • We had a single instance Apache Solr service that needed to become more critical.. and we decided that SolrCloud wasn't for us • With our past success with Consul and ECS we dove back into test-kitchen with a new Ansible role • With a role we trusted, we could build an AMI, and then a Terraform module • Instead of using a load balancer (which, many folks seem to use with ES) Consul service discovery via DNS became the norm @redmind
  • 19. @redmind • ElasticSearch clients began accessing ElasticSearch as “elasticsearch.service.consul” • May 23, 2017 Consul became part of the critical customer path for all searches
  • 21. Six months in… • In that time, more containerized services have been written and are ready to head out the door • Two new modules to round out our growing library  ecs_task  ecs_service • Quickly went from the initial 2 Tensorflow services • 19 services and 4 batch-style tasks today • More interestingly: these new services are being written using AWS services • S3, SQS, Kinesis, KMS, Lambda • We’d shaken off the inertia of the distributed monolith! • We converted our Ansible inventory to a dynamic inventory driven by Terraform- managed EC2 Tags in late July @redmind
  • 22. The latter half • Started to bring parts of the legacy systems into Terraform and Consul  Backend admin tool containerized and migrated to ECS in late June  Elastic MapReduce (EMR) taken under Terraform control in August  Our Wildfly cluster was turned into a Terraform module in November  We actually imported things here   Our frontend UI moved to ECS in early January ‘18 @redmind
  • 23. Reflection: Imports? • We still don’t have everything managed by Terraform • Our “legacy resource” variables for VPC and subnets are still variables  No great urgency or business need to deal with importing them  Data Providers make this a non-issue  We’ll likely go “full Terraform” by the end of 2018 @redmind
  • 24. Reflection: Outages? Uh-ohs? • We’ve been really fortunate  diligent about running plans and paying attention to the output • We did lose search one day…  Consul agents didn’t startup on the ElasticSearch instances after a maintenance script ran  Not Consul’s fault: operator error @redmind
  • 25. Reflection: Do overs? • Wish I’d known about Molecule sooner  We’ve yet to move a bunch of test-kitchen projects to it • Our ecs_task module’s definition of the container’s environment is brittle  We have plans in the works to move to envconsul as an entrypoint to address this issue • We love the our environment model (it makes us feel safe :D)... but a single statefile per-environment is starting to get slow  Plan to breakup this state by the fall • We wish we already had dynamic secrets...  currently doing some hacky magic with encrypted S3 objects and KMS to deal with getting secrets into containers @redmind
  • 27. Addendum: external links • Building Immutable Machine Images with Packer and Ansible  https://www.slideshare.net/JasonHarley3/building-immutable-machine-images-with- packer-and-ansible/ • charity.wtf: TERRAFORM, VPC, AND WHY YOU WANT A TFSTATE FILE PER ENV  https://charity.wtf/2016/03/30/terraform-vpc-and-why-you-want-a-tfstate-file-per- env/ • tfenv: Terraform version manager inspired by rbenv  https://github.com/kamatama41/tfenv • AWS Samples: ECS Container draining  https://github.com/awslabs/ecs-cid-sample @redmind

Editor's Notes

  1. Street Contxt is a global knowledge exchange for institutional finance, with their SaaS platform delivering smart, actionable insights customers all over the globe.
  2. Building Immutable Machine Images with Packer and Ansible: https://www.slideshare.net/JasonHarley3/building-immutable-machine-images-with-packer-and-ansible/
  3. charity.wtf: TERRAFORM, VPC, AND WHY YOU WANT A TFSTATE FILE PER ENV - https://charity.wtf/2016/03/30/terraform-vpc-and-why-you-want-a-tfstate-file-per-env/ tfenv: https://github.com/kamatama41/tfenv
  4. Source: https://github.com/awslabs/ecs-cid-sample
  5. Making Docker and Consul Get Along: https://medium.com/zendesk-engineering/making-docker-and-consul-get-along-5fceda1d52b9 Registrator: http://gliderlabs.github.io/registrator
  6. March 23, 2017: Consul 0.7.5 launched into production with ECS