SlideShare a Scribd company logo
1 of 18
LESSONS FROM A
DEVOPS
TRANSFORMATION
ON AWS
Who Am I?
❖ Former appserver developer
❖ Started with Java, some Python, some Go
❖ Working with applications and operations on AWS since 2007
➢ (Dev)Ops fascination from around the same time
➢ Led the engineering team for a SaaS product
➢ Had the good fortune to work with some extremely smart people
❖ Interests lie in distributed systems and scalability
❖ DevOps/Cloud practice lead @ImagineaTech
❖ DevOps editor at InfoQ.com
❖ Elsewhere
➢ https://www.linkedin.com/in/hrishikeshbarua
➢ https://twitter.com/talonx
The Product in Question
❖ Marketing platform for brands to run customer engagement
and loyalty campaigns
❖ SaaS model
Technology & Infrastructure
❖ Hosted on Amazon Web Services, initially in one region, later spread over
multiple
❖ EC2, S3, EBS, CloudFront
❖ External DNS (and later CDN)
❖ Mostly Java/JavaScript/MySQL/Kafka/Redis
❖ Integration with multiple third-party APIs and services
❖ Puppet/vagrant/Jenkins/Graphite/Collectd/Nagios
To Set Some Context
❖ Roughly covers the period 2010 - 2014, so some things might sound
quaint today
❖ DevOps transformation took place over a period of years
➢ Started with small scale AWS infra, legacy tools, monolithic app architecture.
➢ Ended with multi-region one-click deployment, combination of mono + service oriented
architecture, OSS + custom built ops tools.
➢ The following slides are a summary of some key learnings on AWS Ops.
We’ll focus on a few interesting areas
Monitoring
❖ Monitoring-as-a-Service or Self-hosted?
➢ You might need both, if you have a complex/legacy + modern app or want more flexibility.
➢ Monitor the self-hosted monitor using the external one.
➢ Self-hosted monitoring tools and dashboards should have backups. If the AWS AZ in which
you host your monitoring system goes down, you’ll be semi-blind.
❖ Choose the right tools
➢ Get rid of the dinosaur. Convincing your traditional IT folks about jettisoning Nagios might
be the toughest part.
➢ Relational view is important. A single service might be dependent on others (e.g. a REST API
dependent on DNS, LB, backend nodes, database, caching layer) - it’s important to be able
to see this relationship in your dashboard.
Monitoring
❖ Watch out for AWS specific quirks
➢ Steal time? Alerting software needs to take this into account.
❖ There’s no such thing as too much monitoring
➢ Monitor the AWS RSS feed - can serve as an indicator of potential problems. Caveats
■ AWS Problems are sometimes localized.
■ This can at best serve as an early warning system.
➢ Collect and plot everything
■ Deployment points (Thanks, Etsy)
■ Graphite is a swallow-all, easy to use system
Monitoring
❖ Automate
➢ The provisioning process for a server (or a service) should take care of including it in your
monitoring system.
Backups and Disaster Recovery
❖ Specifics usually depend on the app architecture and the level of
automation
❖ Instances
➢ Base AMI + Configuration Management? (Puppet/Chef/Ansible)
➢ Golden images + Immutable Servers?
➢ All of the above?
Backups & Disaster Recovery
❖ Databases
➢ Self-hosted vs RDS
■ RDS limitations
➢ Replication, EBS snapshots
➢ Data consistency
■ Freeze/unfreeze
■ Database specific quirks for snapshotting
■ Snapshotting the read-only slave? Ensure that the lag time is low (and monitored)
■ Cross region backups (but is your app cross-region ready? If not, why bother?)
Security
❖ Go with VPC (older AWS accounts have both Classic and VPC)
❖ Amazon provides the first level of defence
➢ Strong network component for DDOS, rest depends on you
➢ Plan security groups from the beginning
Security
❖ ssh keys
➢ Adopt a tool to manage per-user ssh keys
➢ EC2 metadata for instance(s) will continue to show the original keypair name it was
created with. The original public key may not even exist on the instance anymore if
revoked, but the metadata will show it. This is because AWS has no way of knowing that
you changed the authorized_keys file.
➢ You can upload your own keys to the AWS console and they will be available for use while
launching EC2 instances. Your generated keys have to be RSA keys of 1024, 2048 or 4096
bits.
Security
❖ ssh keys
➢ Are AWS key-pairs confined to a single region? This is true only if you consider the default
state of affairs. You can get around it.
■ For keys that you generate, you can import them to all the regions you want using
the AWS console or the CLI tools.
■ For keys that AWS generates, you can take the public key from an EC2 instance
launched with that key, and import that in a similar manner to all the regions you
want.
Automation
❖ CI
➢ Easy to set up, no excuses. Once set up, have an owner for incremental improvements
➢ Don’t let Broken Windows remain broken
➢ The move to CD may not be so easy - needs buy-in from all quarters
❖ Configuration Management
➢ Again, hard to do if not done from the beginning
➢ Choose one (Ansible/Puppet/Chef) and master it
People & Architecture
❖ Have an owner for system architecture
➢ All architecture decisions however small, matter
➢ And most such decisions need to be taken “urgently”
❖ Buy-in from management
➢ Demonstrate value to the product/business. Visibility is paramount. Don’t expect to be
understood all the time.
➢ “Make more awesome” - Jesse Robbins
People & Architecture
❖ Adopt uniform abstractions
➢ E.g. Don’t adopt two different queueing software for two different purposes if one can
handle both (“cool stuff syndrome”).
❖ Cross region failover is hard if not designed early
➢ Specifics will depend on your product
Thank You

More Related Content

What's hot

What's hot (12)

Microservices in Azure
Microservices in AzureMicroservices in Azure
Microservices in Azure
 
AWS Lunch and Learn - Workspaces. May 27th 2014
AWS Lunch and Learn - Workspaces. May 27th 2014AWS Lunch and Learn - Workspaces. May 27th 2014
AWS Lunch and Learn - Workspaces. May 27th 2014
 
Reactjs
ReactjsReactjs
Reactjs
 
Habitat at LinuxLab IT
Habitat at LinuxLab ITHabitat at LinuxLab IT
Habitat at LinuxLab IT
 
Ansible
AnsibleAnsible
Ansible
 
Installing and Setting Up WordPress
Installing and Setting Up WordPressInstalling and Setting Up WordPress
Installing and Setting Up WordPress
 
RavenDB embedded at massive scales
RavenDB embedded at massive scalesRavenDB embedded at massive scales
RavenDB embedded at massive scales
 
Aws, an intro to startups
Aws, an intro to startupsAws, an intro to startups
Aws, an intro to startups
 
Dnn europe 2013 dnn cloud - no video
Dnn europe 2013   dnn cloud - no videoDnn europe 2013   dnn cloud - no video
Dnn europe 2013 dnn cloud - no video
 
Elatt Presentation
Elatt PresentationElatt Presentation
Elatt Presentation
 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 Distilled
 
Data Scotland 2019: You can run SQL Server on AWS
Data Scotland 2019: You can run SQL Server on AWSData Scotland 2019: You can run SQL Server on AWS
Data Scotland 2019: You can run SQL Server on AWS
 

Viewers also liked (6)

Odo preventiva
Odo preventivaOdo preventiva
Odo preventiva
 
Por qué tenemos caries
Por qué tenemos cariesPor qué tenemos caries
Por qué tenemos caries
 
Maloclusión
MaloclusiónMaloclusión
Maloclusión
 
ASISTENCIA EN SALUD BUCAL
ASISTENCIA EN SALUD BUCALASISTENCIA EN SALUD BUCAL
ASISTENCIA EN SALUD BUCAL
 
Odontologia preventiva del niño y adolescente i
Odontologia preventiva del niño y adolescente iOdontologia preventiva del niño y adolescente i
Odontologia preventiva del niño y adolescente i
 
Presentación Odontologia Preventiva
Presentación Odontologia PreventivaPresentación Odontologia Preventiva
Presentación Odontologia Preventiva
 

Similar to Lessons From A DevOps Transformation on AWS

Infrastructure Considerations : Design : "webops"
Infrastructure Considerations : Design : "webops"Infrastructure Considerations : Design : "webops"
Infrastructure Considerations : Design : "webops"
Piyush Kumar
 
Methods of Sharding MySQL
Methods of Sharding MySQLMethods of Sharding MySQL
Methods of Sharding MySQL
Laine Campbell
 
Introduction to amazon web services for developers
Introduction to amazon web services for developersIntroduction to amazon web services for developers
Introduction to amazon web services for developers
Ciklum Ukraine
 

Similar to Lessons From A DevOps Transformation on AWS (20)

A real-life account of moving 100% to a public cloud
A real-life account of moving 100% to a public cloudA real-life account of moving 100% to a public cloud
A real-life account of moving 100% to a public cloud
 
Clouds presentation, aws meetup v2
Clouds presentation, aws meetup   v2Clouds presentation, aws meetup   v2
Clouds presentation, aws meetup v2
 
Infrastructure Considerations : Design : "webops"
Infrastructure Considerations : Design : "webops"Infrastructure Considerations : Design : "webops"
Infrastructure Considerations : Design : "webops"
 
Aws architecture main ideas
Aws architecture main ideasAws architecture main ideas
Aws architecture main ideas
 
Cloud monster legacy migrations to AWS - AWS Community Day Nordics - 19/2/2019
Cloud monster legacy migrations to AWS - AWS Community Day Nordics - 19/2/2019Cloud monster legacy migrations to AWS - AWS Community Day Nordics - 19/2/2019
Cloud monster legacy migrations to AWS - AWS Community Day Nordics - 19/2/2019
 
Cloudy in Indonesia: Java and Cloud
Cloudy in Indonesia: Java and CloudCloudy in Indonesia: Java and Cloud
Cloudy in Indonesia: Java and Cloud
 
Experiences with Microservices at Tuenti
Experiences with Microservices at TuentiExperiences with Microservices at Tuenti
Experiences with Microservices at Tuenti
 
How to build an HA container orchestrator infrastructure for production – Giu...
How to build an HA container orchestrator infrastructure for production – Giu...How to build an HA container orchestrator infrastructure for production – Giu...
How to build an HA container orchestrator infrastructure for production – Giu...
 
Cloud-Native DevOps: Simplifying application lifecycle management with AWS | ...
Cloud-Native DevOps: Simplifying application lifecycle management with AWS | ...Cloud-Native DevOps: Simplifying application lifecycle management with AWS | ...
Cloud-Native DevOps: Simplifying application lifecycle management with AWS | ...
 
Introduction to ansible
Introduction to ansibleIntroduction to ansible
Introduction to ansible
 
Methods of Sharding MySQL
Methods of Sharding MySQLMethods of Sharding MySQL
Methods of Sharding MySQL
 
Infrastructure as Code to Maintain your Sanity
Infrastructure as Code to Maintain your SanityInfrastructure as Code to Maintain your Sanity
Infrastructure as Code to Maintain your Sanity
 
Ansible Case Studies
Ansible Case StudiesAnsible Case Studies
Ansible Case Studies
 
JUST EAT: Embracing DevOps
JUST EAT: Embracing DevOpsJUST EAT: Embracing DevOps
JUST EAT: Embracing DevOps
 
The Next Big Thing: Serverless
The Next Big Thing: ServerlessThe Next Big Thing: Serverless
The Next Big Thing: Serverless
 
Dev Ops without the Ops
Dev Ops without the OpsDev Ops without the Ops
Dev Ops without the Ops
 
What we talk about when we talk about DevOps
What we talk about when we talk about DevOpsWhat we talk about when we talk about DevOps
What we talk about when we talk about DevOps
 
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...
 
Introduction to amazon web services for developers
Introduction to amazon web services for developersIntroduction to amazon web services for developers
Introduction to amazon web services for developers
 
Ops for NoOps - Operational Challenges for Serverless Apps
Ops for NoOps - Operational Challenges for Serverless AppsOps for NoOps - Operational Challenges for Serverless Apps
Ops for NoOps - Operational Challenges for Serverless Apps
 

Recently uploaded

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 

Lessons From A DevOps Transformation on AWS

  • 2. Who Am I? ❖ Former appserver developer ❖ Started with Java, some Python, some Go ❖ Working with applications and operations on AWS since 2007 ➢ (Dev)Ops fascination from around the same time ➢ Led the engineering team for a SaaS product ➢ Had the good fortune to work with some extremely smart people ❖ Interests lie in distributed systems and scalability ❖ DevOps/Cloud practice lead @ImagineaTech ❖ DevOps editor at InfoQ.com ❖ Elsewhere ➢ https://www.linkedin.com/in/hrishikeshbarua ➢ https://twitter.com/talonx
  • 3. The Product in Question ❖ Marketing platform for brands to run customer engagement and loyalty campaigns ❖ SaaS model
  • 4. Technology & Infrastructure ❖ Hosted on Amazon Web Services, initially in one region, later spread over multiple ❖ EC2, S3, EBS, CloudFront ❖ External DNS (and later CDN) ❖ Mostly Java/JavaScript/MySQL/Kafka/Redis ❖ Integration with multiple third-party APIs and services ❖ Puppet/vagrant/Jenkins/Graphite/Collectd/Nagios
  • 5. To Set Some Context ❖ Roughly covers the period 2010 - 2014, so some things might sound quaint today ❖ DevOps transformation took place over a period of years ➢ Started with small scale AWS infra, legacy tools, monolithic app architecture. ➢ Ended with multi-region one-click deployment, combination of mono + service oriented architecture, OSS + custom built ops tools. ➢ The following slides are a summary of some key learnings on AWS Ops.
  • 6. We’ll focus on a few interesting areas
  • 7. Monitoring ❖ Monitoring-as-a-Service or Self-hosted? ➢ You might need both, if you have a complex/legacy + modern app or want more flexibility. ➢ Monitor the self-hosted monitor using the external one. ➢ Self-hosted monitoring tools and dashboards should have backups. If the AWS AZ in which you host your monitoring system goes down, you’ll be semi-blind. ❖ Choose the right tools ➢ Get rid of the dinosaur. Convincing your traditional IT folks about jettisoning Nagios might be the toughest part. ➢ Relational view is important. A single service might be dependent on others (e.g. a REST API dependent on DNS, LB, backend nodes, database, caching layer) - it’s important to be able to see this relationship in your dashboard.
  • 8. Monitoring ❖ Watch out for AWS specific quirks ➢ Steal time? Alerting software needs to take this into account. ❖ There’s no such thing as too much monitoring ➢ Monitor the AWS RSS feed - can serve as an indicator of potential problems. Caveats ■ AWS Problems are sometimes localized. ■ This can at best serve as an early warning system. ➢ Collect and plot everything ■ Deployment points (Thanks, Etsy) ■ Graphite is a swallow-all, easy to use system
  • 9. Monitoring ❖ Automate ➢ The provisioning process for a server (or a service) should take care of including it in your monitoring system.
  • 10. Backups and Disaster Recovery ❖ Specifics usually depend on the app architecture and the level of automation ❖ Instances ➢ Base AMI + Configuration Management? (Puppet/Chef/Ansible) ➢ Golden images + Immutable Servers? ➢ All of the above?
  • 11. Backups & Disaster Recovery ❖ Databases ➢ Self-hosted vs RDS ■ RDS limitations ➢ Replication, EBS snapshots ➢ Data consistency ■ Freeze/unfreeze ■ Database specific quirks for snapshotting ■ Snapshotting the read-only slave? Ensure that the lag time is low (and monitored) ■ Cross region backups (but is your app cross-region ready? If not, why bother?)
  • 12. Security ❖ Go with VPC (older AWS accounts have both Classic and VPC) ❖ Amazon provides the first level of defence ➢ Strong network component for DDOS, rest depends on you ➢ Plan security groups from the beginning
  • 13. Security ❖ ssh keys ➢ Adopt a tool to manage per-user ssh keys ➢ EC2 metadata for instance(s) will continue to show the original keypair name it was created with. The original public key may not even exist on the instance anymore if revoked, but the metadata will show it. This is because AWS has no way of knowing that you changed the authorized_keys file. ➢ You can upload your own keys to the AWS console and they will be available for use while launching EC2 instances. Your generated keys have to be RSA keys of 1024, 2048 or 4096 bits.
  • 14. Security ❖ ssh keys ➢ Are AWS key-pairs confined to a single region? This is true only if you consider the default state of affairs. You can get around it. ■ For keys that you generate, you can import them to all the regions you want using the AWS console or the CLI tools. ■ For keys that AWS generates, you can take the public key from an EC2 instance launched with that key, and import that in a similar manner to all the regions you want.
  • 15. Automation ❖ CI ➢ Easy to set up, no excuses. Once set up, have an owner for incremental improvements ➢ Don’t let Broken Windows remain broken ➢ The move to CD may not be so easy - needs buy-in from all quarters ❖ Configuration Management ➢ Again, hard to do if not done from the beginning ➢ Choose one (Ansible/Puppet/Chef) and master it
  • 16. People & Architecture ❖ Have an owner for system architecture ➢ All architecture decisions however small, matter ➢ And most such decisions need to be taken “urgently” ❖ Buy-in from management ➢ Demonstrate value to the product/business. Visibility is paramount. Don’t expect to be understood all the time. ➢ “Make more awesome” - Jesse Robbins
  • 17. People & Architecture ❖ Adopt uniform abstractions ➢ E.g. Don’t adopt two different queueing software for two different purposes if one can handle both (“cool stuff syndrome”). ❖ Cross region failover is hard if not designed early ➢ Specifics will depend on your product