SlideShare a Scribd company logo
LESSONS FROM A
DEVOPS
TRANSFORMATION
ON AWS
Who Am I?
❖ Former appserver developer
❖ Started with Java, some Python, some Go
❖ Working with applications and operations on AWS since 2007
➢ (Dev)Ops fascination from around the same time
➢ Led the engineering team for a SaaS product
➢ Had the good fortune to work with some extremely smart people
❖ Interests lie in distributed systems and scalability
❖ DevOps/Cloud practice lead @ImagineaTech
❖ DevOps editor at InfoQ.com
❖ Elsewhere
➢ https://www.linkedin.com/in/hrishikeshbarua
➢ https://twitter.com/talonx
The Product in Question
❖ Marketing platform for brands to run customer engagement
and loyalty campaigns
❖ SaaS model
Technology & Infrastructure
❖ Hosted on Amazon Web Services, initially in one region, later spread over
multiple
❖ EC2, S3, EBS, CloudFront
❖ External DNS (and later CDN)
❖ Mostly Java/JavaScript/MySQL/Kafka/Redis
❖ Integration with multiple third-party APIs and services
❖ Puppet/vagrant/Jenkins/Graphite/Collectd/Nagios
To Set Some Context
❖ Roughly covers the period 2010 - 2014, so some things might sound
quaint today
❖ DevOps transformation took place over a period of years
➢ Started with small scale AWS infra, legacy tools, monolithic app architecture.
➢ Ended with multi-region one-click deployment, combination of mono + service oriented
architecture, OSS + custom built ops tools.
➢ The following slides are a summary of some key learnings on AWS Ops.
We’ll focus on a few interesting areas
Monitoring
❖ Monitoring-as-a-Service or Self-hosted?
➢ You might need both, if you have a complex/legacy + modern app or want more flexibility.
➢ Monitor the self-hosted monitor using the external one.
➢ Self-hosted monitoring tools and dashboards should have backups. If the AWS AZ in which
you host your monitoring system goes down, you’ll be semi-blind.
❖ Choose the right tools
➢ Get rid of the dinosaur. Convincing your traditional IT folks about jettisoning Nagios might
be the toughest part.
➢ Relational view is important. A single service might be dependent on others (e.g. a REST API
dependent on DNS, LB, backend nodes, database, caching layer) - it’s important to be able
to see this relationship in your dashboard.
Monitoring
❖ Watch out for AWS specific quirks
➢ Steal time? Alerting software needs to take this into account.
❖ There’s no such thing as too much monitoring
➢ Monitor the AWS RSS feed - can serve as an indicator of potential problems. Caveats
■ AWS Problems are sometimes localized.
■ This can at best serve as an early warning system.
➢ Collect and plot everything
■ Deployment points (Thanks, Etsy)
■ Graphite is a swallow-all, easy to use system
Monitoring
❖ Automate
➢ The provisioning process for a server (or a service) should take care of including it in your
monitoring system.
Backups and Disaster Recovery
❖ Specifics usually depend on the app architecture and the level of
automation
❖ Instances
➢ Base AMI + Configuration Management? (Puppet/Chef/Ansible)
➢ Golden images + Immutable Servers?
➢ All of the above?
Backups & Disaster Recovery
❖ Databases
➢ Self-hosted vs RDS
■ RDS limitations
➢ Replication, EBS snapshots
➢ Data consistency
■ Freeze/unfreeze
■ Database specific quirks for snapshotting
■ Snapshotting the read-only slave? Ensure that the lag time is low (and monitored)
■ Cross region backups (but is your app cross-region ready? If not, why bother?)
Security
❖ Go with VPC (older AWS accounts have both Classic and VPC)
❖ Amazon provides the first level of defence
➢ Strong network component for DDOS, rest depends on you
➢ Plan security groups from the beginning
Security
❖ ssh keys
➢ Adopt a tool to manage per-user ssh keys
➢ EC2 metadata for instance(s) will continue to show the original keypair name it was
created with. The original public key may not even exist on the instance anymore if
revoked, but the metadata will show it. This is because AWS has no way of knowing that
you changed the authorized_keys file.
➢ You can upload your own keys to the AWS console and they will be available for use while
launching EC2 instances. Your generated keys have to be RSA keys of 1024, 2048 or 4096
bits.
Security
❖ ssh keys
➢ Are AWS key-pairs confined to a single region? This is true only if you consider the default
state of affairs. You can get around it.
■ For keys that you generate, you can import them to all the regions you want using
the AWS console or the CLI tools.
■ For keys that AWS generates, you can take the public key from an EC2 instance
launched with that key, and import that in a similar manner to all the regions you
want.
Automation
❖ CI
➢ Easy to set up, no excuses. Once set up, have an owner for incremental improvements
➢ Don’t let Broken Windows remain broken
➢ The move to CD may not be so easy - needs buy-in from all quarters
❖ Configuration Management
➢ Again, hard to do if not done from the beginning
➢ Choose one (Ansible/Puppet/Chef) and master it
People & Architecture
❖ Have an owner for system architecture
➢ All architecture decisions however small, matter
➢ And most such decisions need to be taken “urgently”
❖ Buy-in from management
➢ Demonstrate value to the product/business. Visibility is paramount. Don’t expect to be
understood all the time.
➢ “Make more awesome” - Jesse Robbins
People & Architecture
❖ Adopt uniform abstractions
➢ E.g. Don’t adopt two different queueing software for two different purposes if one can
handle both (“cool stuff syndrome”).
❖ Cross region failover is hard if not designed early
➢ Specifics will depend on your product
Thank You

More Related Content

What's hot

Microservices in Azure
Microservices in AzureMicroservices in Azure
Microservices in Azure
Doug Vanderweide
 
AWS Lunch and Learn - Workspaces. May 27th 2014
AWS Lunch and Learn - Workspaces. May 27th 2014AWS Lunch and Learn - Workspaces. May 27th 2014
AWS Lunch and Learn - Workspaces. May 27th 2014
Amazon Web Services
 
Reactjs
ReactjsReactjs
Reactjs
Shubham Gupta
 
Habitat at LinuxLab IT
Habitat at LinuxLab ITHabitat at LinuxLab IT
Habitat at LinuxLab IT
Mandi Walls
 
Ansible
AnsibleAnsible
Ansible
Kamil Lelonek
 
Installing and Setting Up WordPress
Installing and Setting Up WordPressInstalling and Setting Up WordPress
Installing and Setting Up WordPress
Zero Point Development
 
RavenDB embedded at massive scales
RavenDB embedded at massive scalesRavenDB embedded at massive scales
RavenDB embedded at massive scales
Oren Eini
 
Aws, an intro to startups
Aws, an intro to startupsAws, an intro to startups
Aws, an intro to startups
Subramanyam Kasibhat
 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 Distilled
Grig Gheorghiu
 
Data Scotland 2019: You can run SQL Server on AWS
Data Scotland 2019: You can run SQL Server on AWSData Scotland 2019: You can run SQL Server on AWS
Data Scotland 2019: You can run SQL Server on AWS
John McCormack
 

What's hot (12)

Microservices in Azure
Microservices in AzureMicroservices in Azure
Microservices in Azure
 
AWS Lunch and Learn - Workspaces. May 27th 2014
AWS Lunch and Learn - Workspaces. May 27th 2014AWS Lunch and Learn - Workspaces. May 27th 2014
AWS Lunch and Learn - Workspaces. May 27th 2014
 
Reactjs
ReactjsReactjs
Reactjs
 
Habitat at LinuxLab IT
Habitat at LinuxLab ITHabitat at LinuxLab IT
Habitat at LinuxLab IT
 
Ansible
AnsibleAnsible
Ansible
 
Installing and Setting Up WordPress
Installing and Setting Up WordPressInstalling and Setting Up WordPress
Installing and Setting Up WordPress
 
RavenDB embedded at massive scales
RavenDB embedded at massive scalesRavenDB embedded at massive scales
RavenDB embedded at massive scales
 
Aws, an intro to startups
Aws, an intro to startupsAws, an intro to startups
Aws, an intro to startups
 
Dnn europe 2013 dnn cloud - no video
Dnn europe 2013   dnn cloud - no videoDnn europe 2013   dnn cloud - no video
Dnn europe 2013 dnn cloud - no video
 
Elatt Presentation
Elatt PresentationElatt Presentation
Elatt Presentation
 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 Distilled
 
Data Scotland 2019: You can run SQL Server on AWS
Data Scotland 2019: You can run SQL Server on AWSData Scotland 2019: You can run SQL Server on AWS
Data Scotland 2019: You can run SQL Server on AWS
 

Viewers also liked

Odo preventiva
Odo preventivaOdo preventiva
Odo preventiva
Nadezhda Modelo Samudio
 
Por qué tenemos caries
Por qué tenemos cariesPor qué tenemos caries
Por qué tenemos caries
carlosmariopalacio
 
Maloclusión
MaloclusiónMaloclusión
Maloclusión
Alondra Cervantes
 
ASISTENCIA EN SALUD BUCAL
ASISTENCIA EN SALUD BUCALASISTENCIA EN SALUD BUCAL
ASISTENCIA EN SALUD BUCAL
LUIS ALBERTO VILLALBA SARMIENTO
 
Odontologia preventiva del niño y adolescente i
Odontologia preventiva del niño y adolescente iOdontologia preventiva del niño y adolescente i
Odontologia preventiva del niño y adolescente i
gperonam
 
Presentación Odontologia Preventiva
Presentación Odontologia PreventivaPresentación Odontologia Preventiva
Presentación Odontologia Preventivamargarita8a5662
 

Viewers also liked (6)

Odo preventiva
Odo preventivaOdo preventiva
Odo preventiva
 
Por qué tenemos caries
Por qué tenemos cariesPor qué tenemos caries
Por qué tenemos caries
 
Maloclusión
MaloclusiónMaloclusión
Maloclusión
 
ASISTENCIA EN SALUD BUCAL
ASISTENCIA EN SALUD BUCALASISTENCIA EN SALUD BUCAL
ASISTENCIA EN SALUD BUCAL
 
Odontologia preventiva del niño y adolescente i
Odontologia preventiva del niño y adolescente iOdontologia preventiva del niño y adolescente i
Odontologia preventiva del niño y adolescente i
 
Presentación Odontologia Preventiva
Presentación Odontologia PreventivaPresentación Odontologia Preventiva
Presentación Odontologia Preventiva
 

Similar to Lessons From A DevOps Transformation on AWS

A real-life account of moving 100% to a public cloud
A real-life account of moving 100% to a public cloudA real-life account of moving 100% to a public cloud
A real-life account of moving 100% to a public cloud
Julien SIMON
 
Infrastructure Considerations : Design : "webops"
Infrastructure Considerations : Design : "webops"Infrastructure Considerations : Design : "webops"
Infrastructure Considerations : Design : "webops"Piyush Kumar
 
Aws architecture main ideas
Aws architecture main ideasAws architecture main ideas
Aws architecture main ideas
Craig F.R Read
 
Cloud monster legacy migrations to AWS - AWS Community Day Nordics - 19/2/2019
Cloud monster legacy migrations to AWS - AWS Community Day Nordics - 19/2/2019Cloud monster legacy migrations to AWS - AWS Community Day Nordics - 19/2/2019
Cloud monster legacy migrations to AWS - AWS Community Day Nordics - 19/2/2019
Juan Manuel Irigaray
 
Cloudy in Indonesia: Java and Cloud
Cloudy in Indonesia: Java and CloudCloudy in Indonesia: Java and Cloud
Cloudy in Indonesia: Java and Cloud
Eberhard Wolff
 
Experiences with Microservices at Tuenti
Experiences with Microservices at TuentiExperiences with Microservices at Tuenti
Experiences with Microservices at Tuenti
Andrés Viedma Peláez
 
How to build an HA container orchestrator infrastructure for production – Giu...
How to build an HA container orchestrator infrastructure for production – Giu...How to build an HA container orchestrator infrastructure for production – Giu...
How to build an HA container orchestrator infrastructure for production – Giu...
Codemotion
 
Cloud-Native DevOps: Simplifying application lifecycle management with AWS | ...
Cloud-Native DevOps: Simplifying application lifecycle management with AWS | ...Cloud-Native DevOps: Simplifying application lifecycle management with AWS | ...
Cloud-Native DevOps: Simplifying application lifecycle management with AWS | ...
Amazon Web Services
 
Introduction to ansible
Introduction to ansibleIntroduction to ansible
Introduction to ansible
Dharmit Shah
 
Methods of Sharding MySQL
Methods of Sharding MySQLMethods of Sharding MySQL
Methods of Sharding MySQLLaine Campbell
 
Infrastructure as Code to Maintain your Sanity
Infrastructure as Code to Maintain your SanityInfrastructure as Code to Maintain your Sanity
Infrastructure as Code to Maintain your Sanity
Dewey Sasser
 
Ansible Case Studies
Ansible Case StudiesAnsible Case Studies
Ansible Case Studies
Greg DeKoenigsberg
 
JUST EAT: Embracing DevOps
JUST EAT: Embracing DevOpsJUST EAT: Embracing DevOps
JUST EAT: Embracing DevOps
Peter Mounce
 
The Next Big Thing: Serverless
The Next Big Thing: ServerlessThe Next Big Thing: Serverless
The Next Big Thing: Serverless
Doug Vanderweide
 
Dev Ops without the Ops
Dev Ops without the OpsDev Ops without the Ops
Dev Ops without the Ops
Konstantin Gredeskoul
 
What we talk about when we talk about DevOps
What we talk about when we talk about DevOpsWhat we talk about when we talk about DevOps
What we talk about when we talk about DevOps
Ricard Clau
 
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...
Ludovic Piot
 
Introduction to amazon web services for developers
Introduction to amazon web services for developersIntroduction to amazon web services for developers
Introduction to amazon web services for developers
Ciklum Ukraine
 
Ops for NoOps - Operational Challenges for Serverless Apps
Ops for NoOps - Operational Challenges for Serverless AppsOps for NoOps - Operational Challenges for Serverless Apps
Ops for NoOps - Operational Challenges for Serverless Apps
Erica Windisch
 

Similar to Lessons From A DevOps Transformation on AWS (20)

A real-life account of moving 100% to a public cloud
A real-life account of moving 100% to a public cloudA real-life account of moving 100% to a public cloud
A real-life account of moving 100% to a public cloud
 
Clouds presentation, aws meetup v2
Clouds presentation, aws meetup   v2Clouds presentation, aws meetup   v2
Clouds presentation, aws meetup v2
 
Infrastructure Considerations : Design : "webops"
Infrastructure Considerations : Design : "webops"Infrastructure Considerations : Design : "webops"
Infrastructure Considerations : Design : "webops"
 
Aws architecture main ideas
Aws architecture main ideasAws architecture main ideas
Aws architecture main ideas
 
Cloud monster legacy migrations to AWS - AWS Community Day Nordics - 19/2/2019
Cloud monster legacy migrations to AWS - AWS Community Day Nordics - 19/2/2019Cloud monster legacy migrations to AWS - AWS Community Day Nordics - 19/2/2019
Cloud monster legacy migrations to AWS - AWS Community Day Nordics - 19/2/2019
 
Cloudy in Indonesia: Java and Cloud
Cloudy in Indonesia: Java and CloudCloudy in Indonesia: Java and Cloud
Cloudy in Indonesia: Java and Cloud
 
Experiences with Microservices at Tuenti
Experiences with Microservices at TuentiExperiences with Microservices at Tuenti
Experiences with Microservices at Tuenti
 
How to build an HA container orchestrator infrastructure for production – Giu...
How to build an HA container orchestrator infrastructure for production – Giu...How to build an HA container orchestrator infrastructure for production – Giu...
How to build an HA container orchestrator infrastructure for production – Giu...
 
Cloud-Native DevOps: Simplifying application lifecycle management with AWS | ...
Cloud-Native DevOps: Simplifying application lifecycle management with AWS | ...Cloud-Native DevOps: Simplifying application lifecycle management with AWS | ...
Cloud-Native DevOps: Simplifying application lifecycle management with AWS | ...
 
Introduction to ansible
Introduction to ansibleIntroduction to ansible
Introduction to ansible
 
Methods of Sharding MySQL
Methods of Sharding MySQLMethods of Sharding MySQL
Methods of Sharding MySQL
 
Infrastructure as Code to Maintain your Sanity
Infrastructure as Code to Maintain your SanityInfrastructure as Code to Maintain your Sanity
Infrastructure as Code to Maintain your Sanity
 
Ansible Case Studies
Ansible Case StudiesAnsible Case Studies
Ansible Case Studies
 
JUST EAT: Embracing DevOps
JUST EAT: Embracing DevOpsJUST EAT: Embracing DevOps
JUST EAT: Embracing DevOps
 
The Next Big Thing: Serverless
The Next Big Thing: ServerlessThe Next Big Thing: Serverless
The Next Big Thing: Serverless
 
Dev Ops without the Ops
Dev Ops without the OpsDev Ops without the Ops
Dev Ops without the Ops
 
What we talk about when we talk about DevOps
What we talk about when we talk about DevOpsWhat we talk about when we talk about DevOps
What we talk about when we talk about DevOps
 
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...
 
Introduction to amazon web services for developers
Introduction to amazon web services for developersIntroduction to amazon web services for developers
Introduction to amazon web services for developers
 
Ops for NoOps - Operational Challenges for Serverless Apps
Ops for NoOps - Operational Challenges for Serverless AppsOps for NoOps - Operational Challenges for Serverless Apps
Ops for NoOps - Operational Challenges for Serverless Apps
 

Recently uploaded

Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 

Recently uploaded (20)

Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 

Lessons From A DevOps Transformation on AWS

  • 2. Who Am I? ❖ Former appserver developer ❖ Started with Java, some Python, some Go ❖ Working with applications and operations on AWS since 2007 ➢ (Dev)Ops fascination from around the same time ➢ Led the engineering team for a SaaS product ➢ Had the good fortune to work with some extremely smart people ❖ Interests lie in distributed systems and scalability ❖ DevOps/Cloud practice lead @ImagineaTech ❖ DevOps editor at InfoQ.com ❖ Elsewhere ➢ https://www.linkedin.com/in/hrishikeshbarua ➢ https://twitter.com/talonx
  • 3. The Product in Question ❖ Marketing platform for brands to run customer engagement and loyalty campaigns ❖ SaaS model
  • 4. Technology & Infrastructure ❖ Hosted on Amazon Web Services, initially in one region, later spread over multiple ❖ EC2, S3, EBS, CloudFront ❖ External DNS (and later CDN) ❖ Mostly Java/JavaScript/MySQL/Kafka/Redis ❖ Integration with multiple third-party APIs and services ❖ Puppet/vagrant/Jenkins/Graphite/Collectd/Nagios
  • 5. To Set Some Context ❖ Roughly covers the period 2010 - 2014, so some things might sound quaint today ❖ DevOps transformation took place over a period of years ➢ Started with small scale AWS infra, legacy tools, monolithic app architecture. ➢ Ended with multi-region one-click deployment, combination of mono + service oriented architecture, OSS + custom built ops tools. ➢ The following slides are a summary of some key learnings on AWS Ops.
  • 6. We’ll focus on a few interesting areas
  • 7. Monitoring ❖ Monitoring-as-a-Service or Self-hosted? ➢ You might need both, if you have a complex/legacy + modern app or want more flexibility. ➢ Monitor the self-hosted monitor using the external one. ➢ Self-hosted monitoring tools and dashboards should have backups. If the AWS AZ in which you host your monitoring system goes down, you’ll be semi-blind. ❖ Choose the right tools ➢ Get rid of the dinosaur. Convincing your traditional IT folks about jettisoning Nagios might be the toughest part. ➢ Relational view is important. A single service might be dependent on others (e.g. a REST API dependent on DNS, LB, backend nodes, database, caching layer) - it’s important to be able to see this relationship in your dashboard.
  • 8. Monitoring ❖ Watch out for AWS specific quirks ➢ Steal time? Alerting software needs to take this into account. ❖ There’s no such thing as too much monitoring ➢ Monitor the AWS RSS feed - can serve as an indicator of potential problems. Caveats ■ AWS Problems are sometimes localized. ■ This can at best serve as an early warning system. ➢ Collect and plot everything ■ Deployment points (Thanks, Etsy) ■ Graphite is a swallow-all, easy to use system
  • 9. Monitoring ❖ Automate ➢ The provisioning process for a server (or a service) should take care of including it in your monitoring system.
  • 10. Backups and Disaster Recovery ❖ Specifics usually depend on the app architecture and the level of automation ❖ Instances ➢ Base AMI + Configuration Management? (Puppet/Chef/Ansible) ➢ Golden images + Immutable Servers? ➢ All of the above?
  • 11. Backups & Disaster Recovery ❖ Databases ➢ Self-hosted vs RDS ■ RDS limitations ➢ Replication, EBS snapshots ➢ Data consistency ■ Freeze/unfreeze ■ Database specific quirks for snapshotting ■ Snapshotting the read-only slave? Ensure that the lag time is low (and monitored) ■ Cross region backups (but is your app cross-region ready? If not, why bother?)
  • 12. Security ❖ Go with VPC (older AWS accounts have both Classic and VPC) ❖ Amazon provides the first level of defence ➢ Strong network component for DDOS, rest depends on you ➢ Plan security groups from the beginning
  • 13. Security ❖ ssh keys ➢ Adopt a tool to manage per-user ssh keys ➢ EC2 metadata for instance(s) will continue to show the original keypair name it was created with. The original public key may not even exist on the instance anymore if revoked, but the metadata will show it. This is because AWS has no way of knowing that you changed the authorized_keys file. ➢ You can upload your own keys to the AWS console and they will be available for use while launching EC2 instances. Your generated keys have to be RSA keys of 1024, 2048 or 4096 bits.
  • 14. Security ❖ ssh keys ➢ Are AWS key-pairs confined to a single region? This is true only if you consider the default state of affairs. You can get around it. ■ For keys that you generate, you can import them to all the regions you want using the AWS console or the CLI tools. ■ For keys that AWS generates, you can take the public key from an EC2 instance launched with that key, and import that in a similar manner to all the regions you want.
  • 15. Automation ❖ CI ➢ Easy to set up, no excuses. Once set up, have an owner for incremental improvements ➢ Don’t let Broken Windows remain broken ➢ The move to CD may not be so easy - needs buy-in from all quarters ❖ Configuration Management ➢ Again, hard to do if not done from the beginning ➢ Choose one (Ansible/Puppet/Chef) and master it
  • 16. People & Architecture ❖ Have an owner for system architecture ➢ All architecture decisions however small, matter ➢ And most such decisions need to be taken “urgently” ❖ Buy-in from management ➢ Demonstrate value to the product/business. Visibility is paramount. Don’t expect to be understood all the time. ➢ “Make more awesome” - Jesse Robbins
  • 17. People & Architecture ❖ Adopt uniform abstractions ➢ E.g. Don’t adopt two different queueing software for two different purposes if one can handle both (“cool stuff syndrome”). ❖ Cross region failover is hard if not designed early ➢ Specifics will depend on your product