SlideShare a Scribd company logo
1 of 28
Configuration Management Hell
2019.01.30 ver 0.1.0-alpha-2
Michał Sochoń
2
CodiLime at a glance
Ranked among TOP 50 fastest growing companies in EMEA by The Financial Times
● 8+ years in business
● 3 locations (Warsaw and Gdansk in Poland & Palo Alto in the US)
● 170+ people on board, including 90+ software engineers, 30+ DevOps engineers
● Working with market leaders including: Juniper, NTT, Nutanix, GigaSpaces, Cloudify
● SDN, NFV & Orchestrators
● Cloud native, Serverless &
Multicloud
● Edge Computing
● Software engineering
● DevOps
● UX/UI
● R&D
Areas of expertise Services
3
In general
● Provisioning of resources
● Configuration
● Data management
4
Config Paradigms
● Imperative
○ Non-idempotent
○ Defines how to exactly perform steps
● Declarative
○ Idempotent
○ Defines desired state
5
Parameters
● Vendor specific
● App config
● System Tunables
● Secrets / Security
● Dependent services
6
Ways of provision
● Out of band - create pre-baked images
● Inline - live on remote systems
● Inline - live on local systems
● Inline - in runtime
7
Most common setups
● Execution on demand
● Scheduled
● Event based
8
Example
● Prepare ETCD service in GCP
● Limited tunable parameters
○ Version
○ Cluster size / failure domains
○ Performance: instance type, disk size
● Auto healing
○ Periodic backup
○ Auto disaster recovery
9
Limitations and Challenges
● What parts are most likely to be changed?
○ The less there are, the easier to make it
● How often we are willing to change it
○ How do we handle data migration/availability
○ Cloud providers gives use 2 states of service availability, instead of 3
10
High level flow
● Git repo
● Create image with app, extra packages
● Create required resources
● Create instance group
● …
● Profit!
11
Development flow
● Single instance
● A set of instances
● Set of instances from image
● Auto scaled set of instances from image
○ Integrate health checks
○ Integrate backup
○ Integrate recovery
12
But where is ‘the’ hell?
13
It’s because...
● You cannot bake in whole config, must be adjusted per instance
● Cluster state itself - depending on the state you must configure differently
● Etcdctl - depending on the API version you use to talk to ETCD it returns
different info
● … and depending on formatting it returns different data:
● Nodes which are gone are not marked as dead
○ Need to periodically manually check if node is dead/alive
15
More than one config
● Initial provision - Ansible
● Remember about cloud-init
● Remember about instance tags/labels/meta-data
● Add script to join instance to cluster based on its state
16
On instance launch
● Executes cloud-init
○ provision disks
○ Exec thin config script
● Thin config script talks to the cloud API to find which instances to connect to
○ We assume we use machine account
○ We assume instance has certain metadata keys
○ Thin config script is baked into image
17
Thin config script
● Talk to cloud API, find instances
● Try to query instances for service state
○ If cluster is alive, join cluster *
○ If no response from cluster
■ Check if on bucket there is already existing cluster state backup
■ Restore state on instance
■ Prepare config to be adjusted to expected instances in cluster
■ Launch service
18
The asterisk!
● If cluster is alive, join cluster *
● Cluster member list returns nodes, but does not show their state
○ Need to check if they are dead/alive
○ Prepare config which is in sync with cluster state
○ Join cluster…
● Race conditions when more than 2 instances joining
■ Mitigate with using random sleep ;)
20
Another try
● Make thin config script simpler
○ Just wait till current node is expected by the cluster
○ The code to bootstrap fresh cluster is left as is
○ Remove any node management
● Add script on etcd leader to run cluster node management
○ There can be only one leader in the cluster
○ The code is already there
○ Etcd disallows adding/removing nodes if it renders cluster inoperational
21
What if cluster has...
● No leader - no need to add/remove nodes
○ This usually leads to unhealthy instances
○ In edge cases this will trigger cluster destruction and recreation and
fresh restore
● Single leader - simpler cluster management
○ No more race conditions on start
● Multiple leaders
○ We can avoid that by limiting number of instances
22
More than one config, again
● Initial provision - Ansible
● Remember about cloud-init
● Add scripts
○ to join instance to cluster based on its state
○ to manage cluster nodes if leader
23
So now we have...
● Ansible - imperative or declarative to make image
● Cloud-init - declarative but allows imperatives
● Scripts - imperative
○ Shell
○ gcloud/awscli
○ pex + envtpl
○ Etcdctl
● Terraform - declarative
24
Adding up tools
● Vagrant + Ansible
● Serverspec / Inspec / TestInfra
● Test Kitchen
○ Merges those above, but still lacks a bit in full cluster tests
25
Worth to see
● github.com/MonsantoCo/etcd-aws-cluster/ (shell)
● github.com/ocadotechnology/etcd-dynamic-cluster (python
● etcd rpc proto
● github.com/coreos/etcd-operator
Ummm wait, thats for containers…
26
Summing up
● Depending on the stage we can choose different solution
● Passing parameters from one stage to another
● Sometimes certain solutions are forced
● Sometimes you must make your own tools
Thank you
Krancowa 5
02-493, Warsaw
Poland
+48 22 389 51 00
contact@codilime.com

More Related Content

Similar to CodiLime Tech Talk - Michał Sochoń: Configuration management hell

Data Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLData Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLEDB
 
UKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL TuningUKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL TuningFromDual GmbH
 
Truemotion Adventures in Containerization
Truemotion Adventures in ContainerizationTruemotion Adventures in Containerization
Truemotion Adventures in ContainerizationRyan Hunter
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1Ruslan Meshenberg
 
Leonid Kuligin "Training ML models with Cloud"
 Leonid Kuligin   "Training ML models with Cloud" Leonid Kuligin   "Training ML models with Cloud"
Leonid Kuligin "Training ML models with Cloud"Lviv Startup Club
 
PGConf.ASIA 2019 Bali - Patroni in 2019 - Alexander Kukushkin
PGConf.ASIA 2019 Bali - Patroni in 2019 - Alexander KukushkinPGConf.ASIA 2019 Bali - Patroni in 2019 - Alexander Kukushkin
PGConf.ASIA 2019 Bali - Patroni in 2019 - Alexander KukushkinEqunix Business Solutions
 
PuppetConf 2016: Why Network Automation Matters, and What You Can Do About It...
PuppetConf 2016: Why Network Automation Matters, and What You Can Do About It...PuppetConf 2016: Why Network Automation Matters, and What You Can Do About It...
PuppetConf 2016: Why Network Automation Matters, and What You Can Do About It...Puppet
 
Tomáš Čorej: Configuration management & CFEngine3
Tomáš Čorej: Configuration management & CFEngine3Tomáš Čorej: Configuration management & CFEngine3
Tomáš Čorej: Configuration management & CFEngine3Jano Suchal
 
Mongo nyc nyt + mongodb
Mongo nyc nyt + mongodbMongo nyc nyt + mongodb
Mongo nyc nyt + mongodbDeep Kapadia
 
Devoxx : being productive with JHipster
Devoxx : being productive with JHipsterDevoxx : being productive with JHipster
Devoxx : being productive with JHipsterJulien Dubois
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadKrivoy Rog IT Community
 
Pen Testing Development
Pen Testing DevelopmentPen Testing Development
Pen Testing DevelopmentCTruncer
 
PyTorch crash course
PyTorch crash coursePyTorch crash course
PyTorch crash courseNader Karimi
 
OpenStack Cinder Project Update - Shanghai 2019
OpenStack Cinder Project Update - Shanghai 2019OpenStack Cinder Project Update - Shanghai 2019
OpenStack Cinder Project Update - Shanghai 2019Brian Rosmaita
 
How to deal second interface service discovery and load balancer in kubernetes
How to deal second interface  service discovery and load balancer  in kubernetesHow to deal second interface  service discovery and load balancer  in kubernetes
How to deal second interface service discovery and load balancer in kubernetesMeng-Ze Lee
 
Nagios Conference 2014 - Eric Mislivec - Getting Started With Nagios Core
Nagios Conference 2014 - Eric Mislivec - Getting Started With Nagios CoreNagios Conference 2014 - Eric Mislivec - Getting Started With Nagios Core
Nagios Conference 2014 - Eric Mislivec - Getting Started With Nagios CoreNagios
 
Cloud firewall logging
Cloud firewall loggingCloud firewall logging
Cloud firewall loggingJoyent
 
Intro to Kubernetes & GitOps Workshop
Intro to Kubernetes & GitOps WorkshopIntro to Kubernetes & GitOps Workshop
Intro to Kubernetes & GitOps WorkshopWeaveworks
 

Similar to CodiLime Tech Talk - Michał Sochoń: Configuration management hell (20)

Desplegando a nivel mundial
Desplegando a nivel mundialDesplegando a nivel mundial
Desplegando a nivel mundial
 
Data Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLData Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQL
 
UKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL TuningUKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL Tuning
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
Truemotion Adventures in Containerization
Truemotion Adventures in ContainerizationTruemotion Adventures in Containerization
Truemotion Adventures in Containerization
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Leonid Kuligin "Training ML models with Cloud"
 Leonid Kuligin   "Training ML models with Cloud" Leonid Kuligin   "Training ML models with Cloud"
Leonid Kuligin "Training ML models with Cloud"
 
PGConf.ASIA 2019 Bali - Patroni in 2019 - Alexander Kukushkin
PGConf.ASIA 2019 Bali - Patroni in 2019 - Alexander KukushkinPGConf.ASIA 2019 Bali - Patroni in 2019 - Alexander Kukushkin
PGConf.ASIA 2019 Bali - Patroni in 2019 - Alexander Kukushkin
 
PuppetConf 2016: Why Network Automation Matters, and What You Can Do About It...
PuppetConf 2016: Why Network Automation Matters, and What You Can Do About It...PuppetConf 2016: Why Network Automation Matters, and What You Can Do About It...
PuppetConf 2016: Why Network Automation Matters, and What You Can Do About It...
 
Tomáš Čorej: Configuration management & CFEngine3
Tomáš Čorej: Configuration management & CFEngine3Tomáš Čorej: Configuration management & CFEngine3
Tomáš Čorej: Configuration management & CFEngine3
 
Mongo nyc nyt + mongodb
Mongo nyc nyt + mongodbMongo nyc nyt + mongodb
Mongo nyc nyt + mongodb
 
Devoxx : being productive with JHipster
Devoxx : being productive with JHipsterDevoxx : being productive with JHipster
Devoxx : being productive with JHipster
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
 
Pen Testing Development
Pen Testing DevelopmentPen Testing Development
Pen Testing Development
 
PyTorch crash course
PyTorch crash coursePyTorch crash course
PyTorch crash course
 
OpenStack Cinder Project Update - Shanghai 2019
OpenStack Cinder Project Update - Shanghai 2019OpenStack Cinder Project Update - Shanghai 2019
OpenStack Cinder Project Update - Shanghai 2019
 
How to deal second interface service discovery and load balancer in kubernetes
How to deal second interface  service discovery and load balancer  in kubernetesHow to deal second interface  service discovery and load balancer  in kubernetes
How to deal second interface service discovery and load balancer in kubernetes
 
Nagios Conference 2014 - Eric Mislivec - Getting Started With Nagios Core
Nagios Conference 2014 - Eric Mislivec - Getting Started With Nagios CoreNagios Conference 2014 - Eric Mislivec - Getting Started With Nagios Core
Nagios Conference 2014 - Eric Mislivec - Getting Started With Nagios Core
 
Cloud firewall logging
Cloud firewall loggingCloud firewall logging
Cloud firewall logging
 
Intro to Kubernetes & GitOps Workshop
Intro to Kubernetes & GitOps WorkshopIntro to Kubernetes & GitOps Workshop
Intro to Kubernetes & GitOps Workshop
 

More from CodiLime

CodiLime Tech Talk - Dawid Trzebiatowski i Wojciech Urbański: Opening the Flo...
CodiLime Tech Talk - Dawid Trzebiatowski i Wojciech Urbański: Opening the Flo...CodiLime Tech Talk - Dawid Trzebiatowski i Wojciech Urbański: Opening the Flo...
CodiLime Tech Talk - Dawid Trzebiatowski i Wojciech Urbański: Opening the Flo...CodiLime
 
Rapid help for current networking challenges
Rapid help for current networking challengesRapid help for current networking challenges
Rapid help for current networking challengesCodiLime
 
CodiLime Tech Talk - Grzegorz Rozdzialik: What the java script
CodiLime Tech Talk - Grzegorz Rozdzialik: What the java scriptCodiLime Tech Talk - Grzegorz Rozdzialik: What the java script
CodiLime Tech Talk - Grzegorz Rozdzialik: What the java scriptCodiLime
 
CodiLime Tech Talk - Mateusz Psujek: Keep calm and stay motivated
CodiLime Tech Talk - Mateusz Psujek: Keep calm and stay motivatedCodiLime Tech Talk - Mateusz Psujek: Keep calm and stay motivated
CodiLime Tech Talk - Mateusz Psujek: Keep calm and stay motivatedCodiLime
 
CodiLime Tech Talk - Katarzyna Ziomek-Zdanowicz: RxJS main concepts and real ...
CodiLime Tech Talk - Katarzyna Ziomek-Zdanowicz: RxJS main concepts and real ...CodiLime Tech Talk - Katarzyna Ziomek-Zdanowicz: RxJS main concepts and real ...
CodiLime Tech Talk - Katarzyna Ziomek-Zdanowicz: RxJS main concepts and real ...CodiLime
 
CodiLime Tech Talk - Wojciech Urbański: Cloud Native
CodiLime Tech Talk - Wojciech Urbański: Cloud NativeCodiLime Tech Talk - Wojciech Urbański: Cloud Native
CodiLime Tech Talk - Wojciech Urbański: Cloud NativeCodiLime
 
CodiLime Tech Talk - Łukasz Maksymczuk: Monitoring: Prometheus and Influx
CodiLime Tech Talk - Łukasz Maksymczuk: Monitoring: Prometheus and InfluxCodiLime Tech Talk - Łukasz Maksymczuk: Monitoring: Prometheus and Influx
CodiLime Tech Talk - Łukasz Maksymczuk: Monitoring: Prometheus and InfluxCodiLime
 
CodiLime Tech Talk - Adam Kułagowski: IPv6 - introduction
CodiLime Tech Talk - Adam Kułagowski: IPv6 - introductionCodiLime Tech Talk - Adam Kułagowski: IPv6 - introduction
CodiLime Tech Talk - Adam Kułagowski: IPv6 - introductionCodiLime
 
Tech Talk - Konrad Gawda : P4 programming language
Tech Talk - Konrad Gawda : P4 programming languageTech Talk - Konrad Gawda : P4 programming language
Tech Talk - Konrad Gawda : P4 programming languageCodiLime
 
CodiLime Tech Talk - Michał Pawluk: Our production deployment in AWS (HashiCo...
CodiLime Tech Talk - Michał Pawluk: Our production deployment in AWS (HashiCo...CodiLime Tech Talk - Michał Pawluk: Our production deployment in AWS (HashiCo...
CodiLime Tech Talk - Michał Pawluk: Our production deployment in AWS (HashiCo...CodiLime
 
CodiLime Tech Talk - Michał Cłapiński, Mateusz Jabłoński: Debugging faultily ...
CodiLime Tech Talk - Michał Cłapiński, Mateusz Jabłoński: Debugging faultily ...CodiLime Tech Talk - Michał Cłapiński, Mateusz Jabłoński: Debugging faultily ...
CodiLime Tech Talk - Michał Cłapiński, Mateusz Jabłoński: Debugging faultily ...CodiLime
 
CodiLime Tech Talk - Michał Pawluk: Our deployment of HashiCorp Vault
CodiLime Tech Talk - Michał Pawluk: Our deployment of HashiCorp VaultCodiLime Tech Talk - Michał Pawluk: Our deployment of HashiCorp Vault
CodiLime Tech Talk - Michał Pawluk: Our deployment of HashiCorp VaultCodiLime
 
CodiLime Tech Talk - Jan Kanty Milczek: Basic Recommender Systems – SVDon't
CodiLime Tech Talk - Jan Kanty Milczek: Basic Recommender Systems – SVDon'tCodiLime Tech Talk - Jan Kanty Milczek: Basic Recommender Systems – SVDon't
CodiLime Tech Talk - Jan Kanty Milczek: Basic Recommender Systems – SVDon'tCodiLime
 
CodiLime Tech Talk - Michał Sochoń: Sphinx, reST & Ansible
CodiLime Tech Talk - Michał Sochoń: Sphinx, reST & AnsibleCodiLime Tech Talk - Michał Sochoń: Sphinx, reST & Ansible
CodiLime Tech Talk - Michał Sochoń: Sphinx, reST & AnsibleCodiLime
 
CodiLime Tech Talk - Maciej Sawicki: Streamline application deployments with ...
CodiLime Tech Talk - Maciej Sawicki: Streamline application deployments with ...CodiLime Tech Talk - Maciej Sawicki: Streamline application deployments with ...
CodiLime Tech Talk - Maciej Sawicki: Streamline application deployments with ...CodiLime
 
CodiLime Tech Talk - Jarek Łukow: You need a cloud to test a cloud: using Ope...
CodiLime Tech Talk - Jarek Łukow: You need a cloud to test a cloud: using Ope...CodiLime Tech Talk - Jarek Łukow: You need a cloud to test a cloud: using Ope...
CodiLime Tech Talk - Jarek Łukow: You need a cloud to test a cloud: using Ope...CodiLime
 

More from CodiLime (16)

CodiLime Tech Talk - Dawid Trzebiatowski i Wojciech Urbański: Opening the Flo...
CodiLime Tech Talk - Dawid Trzebiatowski i Wojciech Urbański: Opening the Flo...CodiLime Tech Talk - Dawid Trzebiatowski i Wojciech Urbański: Opening the Flo...
CodiLime Tech Talk - Dawid Trzebiatowski i Wojciech Urbański: Opening the Flo...
 
Rapid help for current networking challenges
Rapid help for current networking challengesRapid help for current networking challenges
Rapid help for current networking challenges
 
CodiLime Tech Talk - Grzegorz Rozdzialik: What the java script
CodiLime Tech Talk - Grzegorz Rozdzialik: What the java scriptCodiLime Tech Talk - Grzegorz Rozdzialik: What the java script
CodiLime Tech Talk - Grzegorz Rozdzialik: What the java script
 
CodiLime Tech Talk - Mateusz Psujek: Keep calm and stay motivated
CodiLime Tech Talk - Mateusz Psujek: Keep calm and stay motivatedCodiLime Tech Talk - Mateusz Psujek: Keep calm and stay motivated
CodiLime Tech Talk - Mateusz Psujek: Keep calm and stay motivated
 
CodiLime Tech Talk - Katarzyna Ziomek-Zdanowicz: RxJS main concepts and real ...
CodiLime Tech Talk - Katarzyna Ziomek-Zdanowicz: RxJS main concepts and real ...CodiLime Tech Talk - Katarzyna Ziomek-Zdanowicz: RxJS main concepts and real ...
CodiLime Tech Talk - Katarzyna Ziomek-Zdanowicz: RxJS main concepts and real ...
 
CodiLime Tech Talk - Wojciech Urbański: Cloud Native
CodiLime Tech Talk - Wojciech Urbański: Cloud NativeCodiLime Tech Talk - Wojciech Urbański: Cloud Native
CodiLime Tech Talk - Wojciech Urbański: Cloud Native
 
CodiLime Tech Talk - Łukasz Maksymczuk: Monitoring: Prometheus and Influx
CodiLime Tech Talk - Łukasz Maksymczuk: Monitoring: Prometheus and InfluxCodiLime Tech Talk - Łukasz Maksymczuk: Monitoring: Prometheus and Influx
CodiLime Tech Talk - Łukasz Maksymczuk: Monitoring: Prometheus and Influx
 
CodiLime Tech Talk - Adam Kułagowski: IPv6 - introduction
CodiLime Tech Talk - Adam Kułagowski: IPv6 - introductionCodiLime Tech Talk - Adam Kułagowski: IPv6 - introduction
CodiLime Tech Talk - Adam Kułagowski: IPv6 - introduction
 
Tech Talk - Konrad Gawda : P4 programming language
Tech Talk - Konrad Gawda : P4 programming languageTech Talk - Konrad Gawda : P4 programming language
Tech Talk - Konrad Gawda : P4 programming language
 
CodiLime Tech Talk - Michał Pawluk: Our production deployment in AWS (HashiCo...
CodiLime Tech Talk - Michał Pawluk: Our production deployment in AWS (HashiCo...CodiLime Tech Talk - Michał Pawluk: Our production deployment in AWS (HashiCo...
CodiLime Tech Talk - Michał Pawluk: Our production deployment in AWS (HashiCo...
 
CodiLime Tech Talk - Michał Cłapiński, Mateusz Jabłoński: Debugging faultily ...
CodiLime Tech Talk - Michał Cłapiński, Mateusz Jabłoński: Debugging faultily ...CodiLime Tech Talk - Michał Cłapiński, Mateusz Jabłoński: Debugging faultily ...
CodiLime Tech Talk - Michał Cłapiński, Mateusz Jabłoński: Debugging faultily ...
 
CodiLime Tech Talk - Michał Pawluk: Our deployment of HashiCorp Vault
CodiLime Tech Talk - Michał Pawluk: Our deployment of HashiCorp VaultCodiLime Tech Talk - Michał Pawluk: Our deployment of HashiCorp Vault
CodiLime Tech Talk - Michał Pawluk: Our deployment of HashiCorp Vault
 
CodiLime Tech Talk - Jan Kanty Milczek: Basic Recommender Systems – SVDon't
CodiLime Tech Talk - Jan Kanty Milczek: Basic Recommender Systems – SVDon'tCodiLime Tech Talk - Jan Kanty Milczek: Basic Recommender Systems – SVDon't
CodiLime Tech Talk - Jan Kanty Milczek: Basic Recommender Systems – SVDon't
 
CodiLime Tech Talk - Michał Sochoń: Sphinx, reST & Ansible
CodiLime Tech Talk - Michał Sochoń: Sphinx, reST & AnsibleCodiLime Tech Talk - Michał Sochoń: Sphinx, reST & Ansible
CodiLime Tech Talk - Michał Sochoń: Sphinx, reST & Ansible
 
CodiLime Tech Talk - Maciej Sawicki: Streamline application deployments with ...
CodiLime Tech Talk - Maciej Sawicki: Streamline application deployments with ...CodiLime Tech Talk - Maciej Sawicki: Streamline application deployments with ...
CodiLime Tech Talk - Maciej Sawicki: Streamline application deployments with ...
 
CodiLime Tech Talk - Jarek Łukow: You need a cloud to test a cloud: using Ope...
CodiLime Tech Talk - Jarek Łukow: You need a cloud to test a cloud: using Ope...CodiLime Tech Talk - Jarek Łukow: You need a cloud to test a cloud: using Ope...
CodiLime Tech Talk - Jarek Łukow: You need a cloud to test a cloud: using Ope...
 

Recently uploaded

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Recently uploaded (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

CodiLime Tech Talk - Michał Sochoń: Configuration management hell

  • 1. Configuration Management Hell 2019.01.30 ver 0.1.0-alpha-2 Michał Sochoń
  • 2. 2 CodiLime at a glance Ranked among TOP 50 fastest growing companies in EMEA by The Financial Times ● 8+ years in business ● 3 locations (Warsaw and Gdansk in Poland & Palo Alto in the US) ● 170+ people on board, including 90+ software engineers, 30+ DevOps engineers ● Working with market leaders including: Juniper, NTT, Nutanix, GigaSpaces, Cloudify ● SDN, NFV & Orchestrators ● Cloud native, Serverless & Multicloud ● Edge Computing ● Software engineering ● DevOps ● UX/UI ● R&D Areas of expertise Services
  • 3. 3 In general ● Provisioning of resources ● Configuration ● Data management
  • 4. 4 Config Paradigms ● Imperative ○ Non-idempotent ○ Defines how to exactly perform steps ● Declarative ○ Idempotent ○ Defines desired state
  • 5. 5 Parameters ● Vendor specific ● App config ● System Tunables ● Secrets / Security ● Dependent services
  • 6. 6 Ways of provision ● Out of band - create pre-baked images ● Inline - live on remote systems ● Inline - live on local systems ● Inline - in runtime
  • 7. 7 Most common setups ● Execution on demand ● Scheduled ● Event based
  • 8. 8 Example ● Prepare ETCD service in GCP ● Limited tunable parameters ○ Version ○ Cluster size / failure domains ○ Performance: instance type, disk size ● Auto healing ○ Periodic backup ○ Auto disaster recovery
  • 9. 9 Limitations and Challenges ● What parts are most likely to be changed? ○ The less there are, the easier to make it ● How often we are willing to change it ○ How do we handle data migration/availability ○ Cloud providers gives use 2 states of service availability, instead of 3
  • 10. 10 High level flow ● Git repo ● Create image with app, extra packages ● Create required resources ● Create instance group ● … ● Profit!
  • 11. 11 Development flow ● Single instance ● A set of instances ● Set of instances from image ● Auto scaled set of instances from image ○ Integrate health checks ○ Integrate backup ○ Integrate recovery
  • 12. 12 But where is ‘the’ hell?
  • 13. 13 It’s because... ● You cannot bake in whole config, must be adjusted per instance ● Cluster state itself - depending on the state you must configure differently ● Etcdctl - depending on the API version you use to talk to ETCD it returns different info ● … and depending on formatting it returns different data: ● Nodes which are gone are not marked as dead ○ Need to periodically manually check if node is dead/alive
  • 14.
  • 15. 15 More than one config ● Initial provision - Ansible ● Remember about cloud-init ● Remember about instance tags/labels/meta-data ● Add script to join instance to cluster based on its state
  • 16. 16 On instance launch ● Executes cloud-init ○ provision disks ○ Exec thin config script ● Thin config script talks to the cloud API to find which instances to connect to ○ We assume we use machine account ○ We assume instance has certain metadata keys ○ Thin config script is baked into image
  • 17. 17 Thin config script ● Talk to cloud API, find instances ● Try to query instances for service state ○ If cluster is alive, join cluster * ○ If no response from cluster ■ Check if on bucket there is already existing cluster state backup ■ Restore state on instance ■ Prepare config to be adjusted to expected instances in cluster ■ Launch service
  • 18. 18 The asterisk! ● If cluster is alive, join cluster * ● Cluster member list returns nodes, but does not show their state ○ Need to check if they are dead/alive ○ Prepare config which is in sync with cluster state ○ Join cluster… ● Race conditions when more than 2 instances joining ■ Mitigate with using random sleep ;)
  • 19.
  • 20. 20 Another try ● Make thin config script simpler ○ Just wait till current node is expected by the cluster ○ The code to bootstrap fresh cluster is left as is ○ Remove any node management ● Add script on etcd leader to run cluster node management ○ There can be only one leader in the cluster ○ The code is already there ○ Etcd disallows adding/removing nodes if it renders cluster inoperational
  • 21. 21 What if cluster has... ● No leader - no need to add/remove nodes ○ This usually leads to unhealthy instances ○ In edge cases this will trigger cluster destruction and recreation and fresh restore ● Single leader - simpler cluster management ○ No more race conditions on start ● Multiple leaders ○ We can avoid that by limiting number of instances
  • 22. 22 More than one config, again ● Initial provision - Ansible ● Remember about cloud-init ● Add scripts ○ to join instance to cluster based on its state ○ to manage cluster nodes if leader
  • 23. 23 So now we have... ● Ansible - imperative or declarative to make image ● Cloud-init - declarative but allows imperatives ● Scripts - imperative ○ Shell ○ gcloud/awscli ○ pex + envtpl ○ Etcdctl ● Terraform - declarative
  • 24. 24 Adding up tools ● Vagrant + Ansible ● Serverspec / Inspec / TestInfra ● Test Kitchen ○ Merges those above, but still lacks a bit in full cluster tests
  • 25. 25 Worth to see ● github.com/MonsantoCo/etcd-aws-cluster/ (shell) ● github.com/ocadotechnology/etcd-dynamic-cluster (python ● etcd rpc proto ● github.com/coreos/etcd-operator Ummm wait, thats for containers…
  • 26. 26 Summing up ● Depending on the stage we can choose different solution ● Passing parameters from one stage to another ● Sometimes certain solutions are forced ● Sometimes you must make your own tools
  • 27.
  • 28. Thank you Krancowa 5 02-493, Warsaw Poland +48 22 389 51 00 contact@codilime.com

Editor's Notes

  1. Exec on demand - ansible, fabric Scheduled - cron, or puppet-master Event Based - usually in modern Active Client - Server, like SaltStack