Building Production-Ready Microservices: DevopsExchangeSF

Michael Kehoe
Michael KehoeArchitect of reliable, scalable infrastructure at LinkedIn
How to Build Production-Ready
Microservices
Michael Kehoe
Staff Site Reliability Engineer
Today’s
agenda
1 Introductions
2 Tenets of Readiness
3 Creating Measurable guidelines
4 Measuring readiness
5 Key Learning’s
6 Q&A
Introduction
Michael Kehoe
$ WHOAMI
• Staff Site Reliability Engineer @ LinkedIn
• Production-SRE Team
• Funny accent = Australian + 4 years
American
• Worked on:
• Networks
• Microservices
• Traffic Engineering
• Databases
Production-SRE Team @ LinkedIn
$ WHOAMI
• Disaster Recovery - Planning &
Automation
• Incident Response – Process &
Automation
• Visibility Engineering – Making use of
operational data
• Reliability Principles – Defining best
practice & automating it
What is a Production Ready
Service?
O’Reilly 2017
Susan J. Fowler
Production-Ready
Microservices
“A production-ready application or service is
one that can be trusted to serve production
traffic…”
S U S A N J . F O W L E R
“… We trust it to behave reasonably, we trust it
to perform reliably, we trust it to get the job
done and to do its job well with very little
downtime.”
S U S A N J . F O W L E R
Tenets of readiness
Tenets of
Readine
ss
1 Stability & Reliability
2 Scalability & Performance
3 Fault Tolerance and DR
4 Monitoring
5 Documentation
Stability & Reliability
Tenets of Readiness
STABILITY
• Stable development cycle
• Stable deployment cycle
Tenets of Readiness
RELIABILITY
• Dependency Management
• Onboarding + Deprecation procedures
• Routing + Discovery
Scalability & Performance
Tenets of Readiness
SCALABILITY
• Understanding growth-scales
• Resource awareness
• Dependency scaling
Tenets of Readiness
PERFORMANCE
• Constant performance evaluation
• Traffic management
• Capacity Planning
Fault Tolerance & DR
Tenets of Readiness
FAULT TOLERANCE
• Avoiding Single Points of Failure (SPOF)
• Resiliency Engineering
Tenets of Readiness
DISASTER RECOVERY
• Understand failure scenario’s
• Disaster Recovery Plans
• Incident Management
Monitoring
Tenets of Readiness
MONITORING
• Dashboards + Alerting for:
• Service
• Resource Allocation
• Infrastructure
• All alerts are actionable and have pre-
documented procedures.
• Logging
Documentation
Tenets of Readiness
DOCUMENTATION
• Have one central landing-place for
documentation for the service
• Review of documentation from Engineer/
SRE/ Partners
• Reviewed Regularly
Tenets of Readiness
DOCUMENTATION
• What should documentation include:
• Key information (ports/ hostnames
etc)
• Description
• Architecture Diagram
• API description
• Oncall information
• Onboarding information
Creating Measurable
Guidelines
Creating Measurable Guidelines
• Not all guidelines directly translate into
something measurable
• You may need to look outcomes of
specific guidance to create measurable
guidelines
Creating Measurable Guidelines
EXAMPLE
• Stability:
• Stable development cycle
• Stable deployment process
• Stable introduction and deprecation
procedures
Creating Measurable Guidelines
EXAMPLE
• Stable development cycle
 Is the unit-test coverage above X %?
 Has this code-base been built in the
last week?
 Is there a staging environment for the
application?
Creating Measurable Guidelines
EXAMPLE
• Stable deployment process
 Has the application been deployed
recently?
 What is the successful deployment
percentage?
Measuring Readiness
Measuring Readiness
WHY?
Ensuring that services
are built and operated
in a standard manner
Standardization
Ensuring that services
are trustworthy
Quality
Assurance
Measuring Readiness
HOW?
Create a manual
checklist
Manual
Checklists
Automate the discovery
and measurement of
readiness
Automated
Scorecards
Breakdown of scores by team
Automated Service Discovery
Service Score Card
Automated Scorecards
Team Overview
Service Score Card
Automated Scorecards
Breakdown of readiness results
Service Score Card
Automated Scorecards
Key Learnings
Key Learnings
A set of guidelines for
what it means for
your service to be
‘ready’
Create
Automate the
checking and scoring
of these guidelines
Automate
Set the expectation
between
Engineering/ Product/
SRE that these
guidelines have to be
met
Evangelize
Q & A
Building Production-Ready Microservices: DevopsExchangeSF
1 of 40

More Related Content

Similar to Building Production-Ready Microservices: DevopsExchangeSF(20)

Microdeployments for microservices   dev ops nashvilleMicrodeployments for microservices   dev ops nashville
Microdeployments for microservices dev ops nashville
Nathaniel (Ned) Bauerle213 views
Introduction to 5w’s of DevOpsIntroduction to 5w’s of DevOps
Introduction to 5w’s of DevOps
Cygnet Infotech572 views
What is DevOps?What is DevOps?
What is DevOps?
Mesut Güneş3.5K views
Infrastructure as code with test approachInfrastructure as code with test approach
Infrastructure as code with test approach
Enrique Carbonell443 views
JC_Gabuya_ResumeJC_Gabuya_Resume
JC_Gabuya_Resume
John Carlo Gabuya118 views
Who needs EA… when we have DevOps?Who needs EA… when we have DevOps?
Who needs EA… when we have DevOps?
Jeff Jakubiak516 views
DevOps Vancouver Meetup - WSBC ProgressDevOps Vancouver Meetup - WSBC Progress
DevOps Vancouver Meetup - WSBC Progress
Andre Kaminski106 views
Introduction to DevOpsIntroduction to DevOps
Introduction to DevOps
Cprime1.2K views

More from Michael Kehoe(20)

eBPF WorkshopeBPF Workshop
eBPF Workshop
Michael Kehoe1.4K views
eBPF BasicseBPF Basics
eBPF Basics
Michael Kehoe2.5K views
Linux Container BasicsLinux Container Basics
Linux Container Basics
Michael Kehoe510 views
SRECon-Europe-2017: Networks for SREsSRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREs
Michael Kehoe383 views
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016
Michael Kehoe665 views
SouthBay SRE Meetup Jan 2016SouthBay SRE Meetup Jan 2016
SouthBay SRE Meetup Jan 2016
Michael Kehoe586 views
Couchbase Meetup Jan 2016Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016
Michael Kehoe991 views

Recently uploaded(20)

SIMPLE PRESENT TENSE_new.pptxSIMPLE PRESENT TENSE_new.pptx
SIMPLE PRESENT TENSE_new.pptx
nisrinamadani2159 views
7 NOVEL DRUG DELIVERY SYSTEM.pptx7 NOVEL DRUG DELIVERY SYSTEM.pptx
7 NOVEL DRUG DELIVERY SYSTEM.pptx
Sachin Nitave56 views
CWP_23995_2013_17_11_2023_FINAL_ORDER.pdfCWP_23995_2013_17_11_2023_FINAL_ORDER.pdf
CWP_23995_2013_17_11_2023_FINAL_ORDER.pdf
SukhwinderSingh895865480 views
Universe revised.pdfUniverse revised.pdf
Universe revised.pdf
DrHafizKosar88 views
ANATOMY AND PHYSIOLOGY UNIT 1 { PART-1}ANATOMY AND PHYSIOLOGY UNIT 1 { PART-1}
ANATOMY AND PHYSIOLOGY UNIT 1 { PART-1}
DR .PALLAVI PATHANIA190 views
Chemistry of sex hormones.pptxChemistry of sex hormones.pptx
Chemistry of sex hormones.pptx
RAJ K. MAURYA107 views
Plastic waste.pdfPlastic waste.pdf
Plastic waste.pdf
alqaseedae94 views
NS3 Unit 2 Life processes of animals.pptxNS3 Unit 2 Life processes of animals.pptx
NS3 Unit 2 Life processes of animals.pptx
manuelaromero201394 views
Scope of Biochemistry.pptxScope of Biochemistry.pptx
Scope of Biochemistry.pptx
shoba shoba119 views
Gopal Chakraborty Memorial Quiz 2.0 Prelims.pptxGopal Chakraborty Memorial Quiz 2.0 Prelims.pptx
Gopal Chakraborty Memorial Quiz 2.0 Prelims.pptx
Debapriya Chakraborty479 views
2022 CAPE Merit List 2023 2022 CAPE Merit List 2023
2022 CAPE Merit List 2023
Caribbean Examinations Council3.5K views
Google solution challenge..pptxGoogle solution challenge..pptx
Google solution challenge..pptx
ChitreshGyanani170 views
STYP infopack.pdfSTYP infopack.pdf
STYP infopack.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego159 views
Narration  ppt.pptxNarration  ppt.pptx
Narration ppt.pptx
TARIQ KHAN76 views
ICS3211_lecture 08_2023.pdfICS3211_lecture 08_2023.pdf
ICS3211_lecture 08_2023.pdf
Vanessa Camilleri79 views

Building Production-Ready Microservices: DevopsExchangeSF

Editor's Notes

  1. Michael So we’re apart of a team at LinkedIn called Production-SRE The key tenants of production-sre at LinkedIn is: Assist in restoring stability during site-critical issues Developing applications to reduce MTTD and MTTR Provide direction and guidelines for site-troubleshooting Build tools for efficient site-issue troubleshooting, issue detection and correlation As this presentation goes on, you’ll notice how an Event Correlation system fits into these
  2. In this context, stability is about having a consistent pre-production experience Development Continuous Integration with central code repo and code review Reproducible builds Unit/ Integration testing Deployment Simple repeatable deploys Canary/ Dark Canary/ Staging Canary testing
  3. Dependency management Unreliability in microservices usually comes from either changes in inbound traffic or changes in behavior from downstream traffic Knowing all of these and understanding impact matters Onboarding/ Deprecation Documented manner to start using API’s Access Control Best Practices Deprecation ACL/ Firewalls Code Cleanup Routing + Discovery Is there a standard way to discover how to get to your service Does your application have reliable health-checks Does your load-balancer respect these Circuit breakers/ Degraders
  4. Growth Scales: How your service scales with the business goals/ metrics How your application scales as it gets more traffic (how do you make it serve more traffic) What resource “bounds” the application throughput Resource Awareness: What is your resource usage What are your bottlenecks Horizontal vs Vertical scaling (don’t do this) Dependency Scaling: Going back to reliability…how do your downstreams scale with your services growth
  5. Performance This is essential Ideally should be done every time a change is made to the service (deployment/ AB flag) Measure and report performance Traffic Management QoS Scaling for bursts/ failover Capacity: This really ties everything together Do you have the right numbers to know how many resources you’ll need in the future
  6. SPOF: In 2018, this really shouldn’t be a problem. Hardware failures Rack failures Do the right thing early to avoid problems down the road Resiliency Engineering May be known as chaos engineering Deliberately break your service to find weakpoints and look to make things fail more gracefully Outages happen, be prepared for them
  7. Failure Scenarios: Understand what ways your service can break – resiliency engineering can help here if you’re ensure Have plans to respond to these Disaster Recovery: For larger scale outages, what’s your plan IM: What’s the process to manage & respond to the outage Gamedays are a great way to test this
  8. Dashboards: Dashboards are for high-level system health Not for regression validation Monitor service/ resources/ infrastructure All alerts should be actionable and have pre-planned responses Anything other than this creates alert fatigue Logging is an underrated aspect of software development. During resiliency testing, see what the logs say and if they’re helpful
  9. Documentation: Possibly the hardest tenet to execute on Should have one place for all documentation (wiki/ website etc) Documentation should be peer-reviewed by Engineers/ SRE/ Partners Everyone should be writing this documentation All documentation should be reviewed every 3-6 months
  10. Problem is general concepts of readiness is not something you can directly translate into something that’s True/ False and can be scored or measured
  11. Here is an example looking at the ‘Stability Tenet
  12. Here is some potential outcomes of implementing some of the stability concepts
  13. So why measure readiness? Readiness helps ensure two key outcomes Standardization One key concept Susan points out in the book is that services should be built in the same manner QA If we have built the service correctly, we should be able to trust that the service will be reliable in Production
  14. So how do we measure this. Manual Checklists This should work ok for a small number of services, but becomes very very difficult as you have increased agility and an increased number of services Automated Scorecards This is a great way to build a scalable, reliable set of checks that can be constantly updated and provide reporting
  15. Create: First step is to agree on what your company defines as production ready Automate: I strongly suggest automating these checks Evangelize: Once you have everything in place, evangelize, set the expectation, make it apart of company culture