SlideShare a Scribd company logo
1 of 33
Download to read offline
SRE & Kubernetes
February, 2022
Hello!
Afkham Azeez
VP & Deputy CTO - Cloud
azeez@wso2.com
3
Off-roading, camping, birding & nature enthusiast
Amateur radio operator - 4S7AZE
/afkham_azeez /afkhamazeez
Software development in 2020 and beyond…
A paradigm shift
● Major changes to how software is designed & built are taking place
● Businesses have realized that they have to build digital experiences
● Building a ‘Digitally-driven Business’ takes time and significant engineering
effort
4
Cloud-native software engineering
Building for the Cloud, on the Cloud
● Start building your product on the cloud
⦿ Have your dev environment on the cloud
● Multi-environment on the cloud
⦿ dev, test, staging, prod
● Leverage cloud services and APIs
⦿ Don’t run everything yourself
● Containers & Kubernetes are game changers
5
With great power comes great complexity!
What is Kubernetes?
● A cluster operating system
● A collection of control loops
7
https://buttondown.email/nelhage/archive/two-reasons-kubernetes-is-so-complex/
IaC
● The process of managing and provisioning computer data centers through
machine-readable definition files, rather than physical hardware configuration or
interactive configuration tools
● Everything is code
⦿ Cluster creation
⦿ Creating workloads
⦿ System configuration
⦿ Security
⦿ etc.
8
Site Reliability Engineering
● SRE is an approach taken to solve IT Operations challenges using Software
Engineering principles.
● SREs use software as a tool to manage cloud systems, diagnose problems, and
automate tasks.
● A key role of SRE is to find the right balance between releasing new features
and ensuring they are reliable;
⦿ Dev teams want to deploy as many features as possible as soon as possible
⦿ SRE tries to facilitates the dev team’s goals while ensuring reliability
● What is reliability?
⦿ Minimizing the impact on end users by minimizing outages
9
What do SREs do?
● Define compliance standards & processes
● Write cluster/system setup code
● Define build pipelines & help dev teams setup pipelines
● Setup monitoring and alerting (code)
● Plan backup and recovery
● Plan DR strategy
● Threat modeling & security scanning
● Incident management
● Chaos engineering
● Root cause analysis
● Perform routine tasks
● Cost analysis & optimization
10
Core Concepts & Methodologies
CICD and GitOps
● Git repos as the single & central sources of truth of the current cluster
configuration
● Use standard git practices
⦿ fork -> branch -> change -> build -> send PR -> CI -> review -> merge -> CD
12
Logging
13
omsagent
Node 2
omsagent
Node 1
omsagent
omsagent
-rs
Node 3
Kubernetes Cluster
Data Explorer
Log Analytics
14
https://docs.microsoft.com/en-us/azure/azure-monitor/containers/container-insights-log-query
Logging + analytics + alerting
15
Log publishing Analytics
Issue or
anomaly
detection
Alerting
Incident
management
Observability, Monitoring & Alerting
● Observability vs monitoring - monitoring is what you do after a system is
observable
● System level monitoring
⦿ Cluster, pod, node health
⦿ System level services/APIs health - includes errors & latencies
⦿ System logs
⦿ Intrusion detection
⦿ DoS
● Application level monitoring
⦿ Application level services/APIs health - includes errors & latencies
⦿ Internal application level observability
⦿ Application logs
16
Incident Management
Unplanned interruption to or quality reduction
of an IT service
17
Normal incident management process
18
Major incident management process
19
SLI, SLO, SLA
● SLI
⦿ Metrics used to measure the level of service provided to end-users (e.g., availability,
latency, throughput)
● SLO
⦿ Targeted levels of service, measured by SLIs
⦿ Typically expressed as a percentage over a period of time
⦿ Help you figure out the right balance between product innovation and reliability
● SLA
⦿ Contractual agreements that outline the level of service end users can expect
⦿ If these promises are not met, there can be significant consequences for the provider,
which are often financial in nature
20
Error Budget
● Error budget = 1-SLO
● Acceptable levels of unreliability for a service before it falls out of compliance
with an SLO
● Measure of risk you can take to
⦿ get new features in
⦿ stop services for maintenance
⦿ routine improvements
⦿ network and infrastructure outages
⦿ unforeseen circumstances
21
Toil & Toil Budget
● Toil
⦿ Kind of work tied to running a production service that tends to be manual, repetitive,
automatable, tactical, devoid of enduring value, and that scales linearly as a service
grows.
⦿ The SRE discipline focuses on a slump of toil as much as possible.
● Toil budget
⦿ A measure of acceptable toil
22
Cron jobs
apiVersion: batch/v1
kind: CronJob
metadata:
name: expenserpt
spec:
schedule: "0 0 1 * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: report
image: expenserpt
imagePullPolicy: IfNotPresent
restartPolicy: OnFailure
23
Cost management
● Use tools provided by cloud platforms
● Set proper cost thresholds
● Resource audit & cost analysis reports
● Set up a cost management team &
weekly reviews
24
● Kubecost
⦿ Provides real-time cost visibility and
insights for teams using Kubernetes
⦿ Helps to continuously reduce cloud
costs
Anti-fragility
● Improving resilience using fire drills, chaos monkey, security and automation
● Kubernetes liveness & readiness probes can be used for health checks
● Kubernetes secret management for sensitive data using Secrets and CSI
25
Security
● Threat modeling using methodologies such as STRIDE
● Scan code repos using tools such as Checkov
● Security specialists - DevSecOps
● Security Operations Center (SOC)
● Kubernetes
⦿ Service Accounts, roles & role bindings
⦿ Network Policies
⦿ Cluster and namespace level isolation
⦿ mTLS enforcement via service meshes
26
Business Continuity & Disaster Recovery
● BCP is the process involved in creating a system of prevention and recovery
from potential threats to a company
● What is a disaster?
⦿ An unforeseen event that could potentially put the organization at risk by interfering
with operations
● Ideally there should be BC plans for all functions of the company which are
amalgamated into a single corporate BC plan
27
Adopting SRE
A way of structuring teams
29
How can your organization adopt SRE?
● Start small & evolve
● Analyze existing team structures/processes and see how they can be adopted
● Recruiting experienced SREs can be hard
⦿ Dev2SRE program
● On the job training
● Certifications are important
⦿ CKAD, CKA, CKS
⦿ Cloud platform certifications - Azure, AWS, GCP etc.
⦿ “Well architected” programs
● Maintain a central knowledge base - document everything
● Define standards, conventions & best practices and ensure that those are
followed
● Define and continuously improve processes
● Work closely with development teams. Engage with all stakeholders.
● Get standards certifications/reports - SOC2, ISO 27001, HIPAA, HITRUST etc 30
TL;DR
● Kubernetes & even app development are just the tip of the iceberg in your
organization’s overall SRE & cloud native story
● Establishment of the SRE discipline is essential for running seamless
operations
● Start small, adapt & evolve
31
Question Time!
wso2.com
Thanks!

More Related Content

What's hot

What's hot (20)

Kubernetes - introduction
Kubernetes - introductionKubernetes - introduction
Kubernetes - introduction
 
Deploying your first application with Kubernetes
Deploying your first application with KubernetesDeploying your first application with Kubernetes
Deploying your first application with Kubernetes
 
Kubernetes Introduction
Kubernetes IntroductionKubernetes Introduction
Kubernetes Introduction
 
Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...
Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...
Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...
 
Introduction to helm
Introduction to helmIntroduction to helm
Introduction to helm
 
Kubernetes: A Short Introduction (2019)
Kubernetes: A Short Introduction (2019)Kubernetes: A Short Introduction (2019)
Kubernetes: A Short Introduction (2019)
 
Kubernetes and Prometheus
Kubernetes and PrometheusKubernetes and Prometheus
Kubernetes and Prometheus
 
Automation CICD
Automation CICDAutomation CICD
Automation CICD
 
Flusso Continuous Integration & Continuous Delivery
Flusso Continuous Integration & Continuous DeliveryFlusso Continuous Integration & Continuous Delivery
Flusso Continuous Integration & Continuous Delivery
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
Creating AWS infrastructure using Terraform
Creating AWS infrastructure using TerraformCreating AWS infrastructure using Terraform
Creating AWS infrastructure using Terraform
 
Kubernetes 101 for Beginners
Kubernetes 101 for BeginnersKubernetes 101 for Beginners
Kubernetes 101 for Beginners
 
Introduction to Docker - 2017
Introduction to Docker - 2017Introduction to Docker - 2017
Introduction to Docker - 2017
 
Terraform on Azure
Terraform on AzureTerraform on Azure
Terraform on Azure
 
Designing a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd productsDesigning a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd products
 
GitLab for CI/CD process
GitLab for CI/CD processGitLab for CI/CD process
GitLab for CI/CD process
 
Managing Kubernetes Cost and Performance with NGINX & Kubecost
Managing Kubernetes Cost and Performance with NGINX & KubecostManaging Kubernetes Cost and Performance with NGINX & Kubecost
Managing Kubernetes Cost and Performance with NGINX & Kubecost
 
GitOps - Operation By Pull Request
GitOps - Operation By Pull RequestGitOps - Operation By Pull Request
GitOps - Operation By Pull Request
 
Kubernetes Deployment Strategies
Kubernetes Deployment StrategiesKubernetes Deployment Strategies
Kubernetes Deployment Strategies
 
Kubernetes Basics
Kubernetes BasicsKubernetes Basics
Kubernetes Basics
 

Similar to SRE & Kubernetes

Network Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveNetwork Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspective
Walid Shaari
 
VMWorld 2004 - Justifying the transition from Physical to Virtual
VMWorld 2004 - Justifying the transition from Physical to VirtualVMWorld 2004 - Justifying the transition from Physical to Virtual
VMWorld 2004 - Justifying the transition from Physical to Virtual
David Kent
 
App Modernization with .NET Core: How Travelers Insurance is Going Cloud-Native
App Modernization with .NET Core: How Travelers Insurance is Going Cloud-NativeApp Modernization with .NET Core: How Travelers Insurance is Going Cloud-Native
App Modernization with .NET Core: How Travelers Insurance is Going Cloud-Native
VMware Tanzu
 

Similar to SRE & Kubernetes (20)

OSMC 2022 | Scaling SLOs with K8s and Cloud-native Observability by George Ha...
OSMC 2022 | Scaling SLOs with K8s and Cloud-native Observability by George Ha...OSMC 2022 | Scaling SLOs with K8s and Cloud-native Observability by George Ha...
OSMC 2022 | Scaling SLOs with K8s and Cloud-native Observability by George Ha...
 
Accelerating Digital Transformation: It's About Digital Enablement
Accelerating Digital Transformation:  It's About Digital EnablementAccelerating Digital Transformation:  It's About Digital Enablement
Accelerating Digital Transformation: It's About Digital Enablement
 
Network Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveNetwork Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspective
 
Moving from Monolith to Microservices
Moving from Monolith to MicroservicesMoving from Monolith to Microservices
Moving from Monolith to Microservices
 
Ahmed El Mawaziny CV
Ahmed El Mawaziny CVAhmed El Mawaziny CV
Ahmed El Mawaziny CV
 
Production-Ready Kubernetes: It's Not About Technology
Production-Ready Kubernetes: It's Not About TechnologyProduction-Ready Kubernetes: It's Not About Technology
Production-Ready Kubernetes: It's Not About Technology
 
Free GitOps Workshop (with Intro to Kubernetes & GitOps)
Free GitOps Workshop (with Intro to Kubernetes & GitOps)Free GitOps Workshop (with Intro to Kubernetes & GitOps)
Free GitOps Workshop (with Intro to Kubernetes & GitOps)
 
Controlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWSControlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWS
 
Why we should consider Open Hybrid Cloud.pdf
Why we should  consider Open Hybrid Cloud.pdfWhy we should  consider Open Hybrid Cloud.pdf
Why we should consider Open Hybrid Cloud.pdf
 
Controlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWSControlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWS
 
Deploy 22 microservices from scratch in 30 mins with GitOps
Deploy 22 microservices from scratch in 30 mins with GitOpsDeploy 22 microservices from scratch in 30 mins with GitOps
Deploy 22 microservices from scratch in 30 mins with GitOps
 
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 MeetupPreparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
 
CI/CD patterns for cloud native apps
CI/CD patterns for  cloud native appsCI/CD patterns for  cloud native apps
CI/CD patterns for cloud native apps
 
Workshop: Delivering chnages for applications and databases
Workshop: Delivering chnages for applications and databasesWorkshop: Delivering chnages for applications and databases
Workshop: Delivering chnages for applications and databases
 
Wie macht man aus Software einen Online-Service in der Cloud
Wie macht man aus Software einen Online-Service in der CloudWie macht man aus Software einen Online-Service in der Cloud
Wie macht man aus Software einen Online-Service in der Cloud
 
VMWorld 2004 - Justifying the transition from Physical to Virtual
VMWorld 2004 - Justifying the transition from Physical to VirtualVMWorld 2004 - Justifying the transition from Physical to Virtual
VMWorld 2004 - Justifying the transition from Physical to Virtual
 
Accelerate Application Migration - August 5, 2020
Accelerate Application Migration - August 5, 2020Accelerate Application Migration - August 5, 2020
Accelerate Application Migration - August 5, 2020
 
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
 
Solving the Hidden Costs of Kubernetes with Observability
Solving the Hidden Costs of Kubernetes with ObservabilitySolving the Hidden Costs of Kubernetes with Observability
Solving the Hidden Costs of Kubernetes with Observability
 
App Modernization with .NET Core: How Travelers Insurance is Going Cloud-Native
App Modernization with .NET Core: How Travelers Insurance is Going Cloud-NativeApp Modernization with .NET Core: How Travelers Insurance is Going Cloud-Native
App Modernization with .NET Core: How Travelers Insurance is Going Cloud-Native
 

More from Afkham Azeez

[WSO2Con Asia 2014] Accelerating Mobile App Development with MBaaS
[WSO2Con Asia 2014] Accelerating Mobile App Development with MBaaS[WSO2Con Asia 2014] Accelerating Mobile App Development with MBaaS
[WSO2Con Asia 2014] Accelerating Mobile App Development with MBaaS
Afkham Azeez
 
Java Colombo: Developing Highly Scalable Apps
Java Colombo: Developing Highly Scalable AppsJava Colombo: Developing Highly Scalable Apps
Java Colombo: Developing Highly Scalable Apps
Afkham Azeez
 
WSO2Con 2013 - The Integration Game Changer: WSO2 Integration Cloud
WSO2Con 2013 - The Integration Game Changer: WSO2 Integration CloudWSO2Con 2013 - The Integration Game Changer: WSO2 Integration Cloud
WSO2Con 2013 - The Integration Game Changer: WSO2 Integration Cloud
Afkham Azeez
 
A multi-tenant architecture for Apache Axis2
A multi-tenant architecture for Apache Axis2A multi-tenant architecture for Apache Axis2
A multi-tenant architecture for Apache Axis2
Afkham Azeez
 

More from Afkham Azeez (20)

Microservices Resiliency with BallerinaLang
Microservices Resiliency with BallerinaLangMicroservices Resiliency with BallerinaLang
Microservices Resiliency with BallerinaLang
 
WSO2Con USA Microservices Transactions
WSO2Con USA  Microservices TransactionsWSO2Con USA  Microservices Transactions
WSO2Con USA Microservices Transactions
 
Introduction to WSO2 Microservices Framework for Java (MSF4J) 2.0
Introduction to WSO2 Microservices Framework for Java (MSF4J) 2.0Introduction to WSO2 Microservices Framework for Java (MSF4J) 2.0
Introduction to WSO2 Microservices Framework for Java (MSF4J) 2.0
 
Microservices with MSF4J - WSO2 Meetup
Microservices with MSF4J - WSO2 MeetupMicroservices with MSF4J - WSO2 Meetup
Microservices with MSF4J - WSO2 Meetup
 
Introduction to WSO2 Microservices Framework for Java - MSF4J - WSO2Con Asia ...
Introduction to WSO2 Microservices Framework for Java - MSF4J - WSO2Con Asia ...Introduction to WSO2 Microservices Framework for Java - MSF4J - WSO2Con Asia ...
Introduction to WSO2 Microservices Framework for Java - MSF4J - WSO2Con Asia ...
 
WSO2ConUS 2015 - Introduction to WSO2 Microservices Server (MSS)
WSO2ConUS 2015 - Introduction to WSO2 Microservices Server (MSS)WSO2ConUS 2015 - Introduction to WSO2 Microservices Server (MSS)
WSO2ConUS 2015 - Introduction to WSO2 Microservices Server (MSS)
 
WSO2Con 2015-us-introduction-to-mss-v2
WSO2Con 2015-us-introduction-to-mss-v2WSO2Con 2015-us-introduction-to-mss-v2
WSO2Con 2015-us-introduction-to-mss-v2
 
[WSO2Con Asia 2014] Accelerating Mobile App Development with MBaaS
[WSO2Con Asia 2014] Accelerating Mobile App Development with MBaaS[WSO2Con Asia 2014] Accelerating Mobile App Development with MBaaS
[WSO2Con Asia 2014] Accelerating Mobile App Development with MBaaS
 
Java Colombo: Developing Highly Scalable Apps
Java Colombo: Developing Highly Scalable AppsJava Colombo: Developing Highly Scalable Apps
Java Colombo: Developing Highly Scalable Apps
 
WSO2Con 2013 - The Integration Game Changer: WSO2 Integration Cloud
WSO2Con 2013 - The Integration Game Changer: WSO2 Integration CloudWSO2Con 2013 - The Integration Game Changer: WSO2 Integration Cloud
WSO2Con 2013 - The Integration Game Changer: WSO2 Integration Cloud
 
Unleashing creativity through Arduino
Unleashing creativity through ArduinoUnleashing creativity through Arduino
Unleashing creativity through Arduino
 
Wso2 con raspberry-pi-cluster
Wso2 con raspberry-pi-clusterWso2 con raspberry-pi-cluster
Wso2 con raspberry-pi-cluster
 
Adjusting carbon topology to match high availability scenario requirements
Adjusting carbon topology to match high availability scenario requirements   Adjusting carbon topology to match high availability scenario requirements
Adjusting carbon topology to match high availability scenario requirements
 
A multi-tenant architecture for Apache Axis2
A multi-tenant architecture for Apache Axis2A multi-tenant architecture for Apache Axis2
A multi-tenant architecture for Apache Axis2
 
Building a multi-tenanted Cloud-native AppServer
Building a multi-tenanted Cloud-native AppServerBuilding a multi-tenanted Cloud-native AppServer
Building a multi-tenanted Cloud-native AppServer
 
Colombo
ColomboColombo
Colombo
 
Intelli J IDEA
Intelli J IDEAIntelli J IDEA
Intelli J IDEA
 
WSO2con 2011: Introduction to Stratos
WSO2con 2011:  Introduction to StratosWSO2con 2011:  Introduction to Stratos
WSO2con 2011: Introduction to Stratos
 
WSO2Con 2011: Introduction to Stratos
WSO2Con 2011: Introduction to StratosWSO2Con 2011: Introduction to Stratos
WSO2Con 2011: Introduction to Stratos
 
WSO2Con 2011: Introduction to the WSO2 Carbon Platform
WSO2Con 2011: Introduction to the WSO2 Carbon PlatformWSO2Con 2011: Introduction to the WSO2 Carbon Platform
WSO2Con 2011: Introduction to the WSO2 Carbon Platform
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

SRE & Kubernetes

  • 2. Hello! Afkham Azeez VP & Deputy CTO - Cloud azeez@wso2.com
  • 3. 3 Off-roading, camping, birding & nature enthusiast Amateur radio operator - 4S7AZE /afkham_azeez /afkhamazeez
  • 4. Software development in 2020 and beyond… A paradigm shift ● Major changes to how software is designed & built are taking place ● Businesses have realized that they have to build digital experiences ● Building a ‘Digitally-driven Business’ takes time and significant engineering effort 4
  • 5. Cloud-native software engineering Building for the Cloud, on the Cloud ● Start building your product on the cloud ⦿ Have your dev environment on the cloud ● Multi-environment on the cloud ⦿ dev, test, staging, prod ● Leverage cloud services and APIs ⦿ Don’t run everything yourself ● Containers & Kubernetes are game changers 5
  • 6. With great power comes great complexity!
  • 7. What is Kubernetes? ● A cluster operating system ● A collection of control loops 7 https://buttondown.email/nelhage/archive/two-reasons-kubernetes-is-so-complex/
  • 8. IaC ● The process of managing and provisioning computer data centers through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools ● Everything is code ⦿ Cluster creation ⦿ Creating workloads ⦿ System configuration ⦿ Security ⦿ etc. 8
  • 9. Site Reliability Engineering ● SRE is an approach taken to solve IT Operations challenges using Software Engineering principles. ● SREs use software as a tool to manage cloud systems, diagnose problems, and automate tasks. ● A key role of SRE is to find the right balance between releasing new features and ensuring they are reliable; ⦿ Dev teams want to deploy as many features as possible as soon as possible ⦿ SRE tries to facilitates the dev team’s goals while ensuring reliability ● What is reliability? ⦿ Minimizing the impact on end users by minimizing outages 9
  • 10. What do SREs do? ● Define compliance standards & processes ● Write cluster/system setup code ● Define build pipelines & help dev teams setup pipelines ● Setup monitoring and alerting (code) ● Plan backup and recovery ● Plan DR strategy ● Threat modeling & security scanning ● Incident management ● Chaos engineering ● Root cause analysis ● Perform routine tasks ● Cost analysis & optimization 10
  • 11. Core Concepts & Methodologies
  • 12. CICD and GitOps ● Git repos as the single & central sources of truth of the current cluster configuration ● Use standard git practices ⦿ fork -> branch -> change -> build -> send PR -> CI -> review -> merge -> CD 12
  • 15. Logging + analytics + alerting 15 Log publishing Analytics Issue or anomaly detection Alerting Incident management
  • 16. Observability, Monitoring & Alerting ● Observability vs monitoring - monitoring is what you do after a system is observable ● System level monitoring ⦿ Cluster, pod, node health ⦿ System level services/APIs health - includes errors & latencies ⦿ System logs ⦿ Intrusion detection ⦿ DoS ● Application level monitoring ⦿ Application level services/APIs health - includes errors & latencies ⦿ Internal application level observability ⦿ Application logs 16
  • 17. Incident Management Unplanned interruption to or quality reduction of an IT service 17
  • 20. SLI, SLO, SLA ● SLI ⦿ Metrics used to measure the level of service provided to end-users (e.g., availability, latency, throughput) ● SLO ⦿ Targeted levels of service, measured by SLIs ⦿ Typically expressed as a percentage over a period of time ⦿ Help you figure out the right balance between product innovation and reliability ● SLA ⦿ Contractual agreements that outline the level of service end users can expect ⦿ If these promises are not met, there can be significant consequences for the provider, which are often financial in nature 20
  • 21. Error Budget ● Error budget = 1-SLO ● Acceptable levels of unreliability for a service before it falls out of compliance with an SLO ● Measure of risk you can take to ⦿ get new features in ⦿ stop services for maintenance ⦿ routine improvements ⦿ network and infrastructure outages ⦿ unforeseen circumstances 21
  • 22. Toil & Toil Budget ● Toil ⦿ Kind of work tied to running a production service that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows. ⦿ The SRE discipline focuses on a slump of toil as much as possible. ● Toil budget ⦿ A measure of acceptable toil 22
  • 23. Cron jobs apiVersion: batch/v1 kind: CronJob metadata: name: expenserpt spec: schedule: "0 0 1 * *" jobTemplate: spec: template: spec: containers: - name: report image: expenserpt imagePullPolicy: IfNotPresent restartPolicy: OnFailure 23
  • 24. Cost management ● Use tools provided by cloud platforms ● Set proper cost thresholds ● Resource audit & cost analysis reports ● Set up a cost management team & weekly reviews 24 ● Kubecost ⦿ Provides real-time cost visibility and insights for teams using Kubernetes ⦿ Helps to continuously reduce cloud costs
  • 25. Anti-fragility ● Improving resilience using fire drills, chaos monkey, security and automation ● Kubernetes liveness & readiness probes can be used for health checks ● Kubernetes secret management for sensitive data using Secrets and CSI 25
  • 26. Security ● Threat modeling using methodologies such as STRIDE ● Scan code repos using tools such as Checkov ● Security specialists - DevSecOps ● Security Operations Center (SOC) ● Kubernetes ⦿ Service Accounts, roles & role bindings ⦿ Network Policies ⦿ Cluster and namespace level isolation ⦿ mTLS enforcement via service meshes 26
  • 27. Business Continuity & Disaster Recovery ● BCP is the process involved in creating a system of prevention and recovery from potential threats to a company ● What is a disaster? ⦿ An unforeseen event that could potentially put the organization at risk by interfering with operations ● Ideally there should be BC plans for all functions of the company which are amalgamated into a single corporate BC plan 27
  • 29. A way of structuring teams 29
  • 30. How can your organization adopt SRE? ● Start small & evolve ● Analyze existing team structures/processes and see how they can be adopted ● Recruiting experienced SREs can be hard ⦿ Dev2SRE program ● On the job training ● Certifications are important ⦿ CKAD, CKA, CKS ⦿ Cloud platform certifications - Azure, AWS, GCP etc. ⦿ “Well architected” programs ● Maintain a central knowledge base - document everything ● Define standards, conventions & best practices and ensure that those are followed ● Define and continuously improve processes ● Work closely with development teams. Engage with all stakeholders. ● Get standards certifications/reports - SOC2, ISO 27001, HIPAA, HITRUST etc 30
  • 31. TL;DR ● Kubernetes & even app development are just the tip of the iceberg in your organization’s overall SRE & cloud native story ● Establishment of the SRE discipline is essential for running seamless operations ● Start small, adapt & evolve 31