SlideShare a Scribd company logo
1 of 28
Download to read offline
透過 Istio
打造企業內的 SRE
Hybrid Specialist: Shawn Ho
shawnho@google.com
1What is SRE?
Product Lifecycle
Concept Business Development Operations Market
Agile
solves this
DevOps
solves this
Developers
Agility
Operators
Stability
Dev & Ops’ KPIs aren't Aligned
What is relationship between Devops and SRE ?
● Devops is more like abstract
concept,guide line and disciplines
to break silos in developments,
operation
● SRE is Google version of realized
practice of Devops.
“Class SRE implements Devops”
Self-Service Platform
Monitoring Automation
CI/CD
SRE
Developers
Class SRE = REAL PERSON
#1. Decision based on data
所有的決定是以資料為基礎
#2. Be user centric
即使所有的監控數據都是正常的,
但客戶只要覺得系統不穩定,那系統就是不穩定
#3. Blameless culture & Share responsibility
降低部門隔閡要由跨部門的責任分享開始 (Developers, Operators, Leader) 系統
系統失效不僅是維運者的責任,程式碼品質,技術債等都是可能的原因
2How to Implement
SRE by Istio/Anthos?
Istio in 2 minutes
Gallery
Service A Service B
proxy proxy
Control Plane API on K8S API Server
Citadel
Logging
plugin
Monitoring
plugin
HTTP, gRPC, TCP
Routing
+
Secure
Naming
CertAuthority
plugin
Ingress Gateway Egress Gateway
mTLSmTLS mTLS
JWT + TLS
Cert issuance
Perimeter
security
policies
Perimeter
security
policies
Istio Control Plane
Pilot
Policy
Enforcement
+
Reporting
Data flow
Control + metrics flow
Local Authz
JWT + TLS
Internal
App 1
External
App 1
What does SRE implement on Platform?
Metrics &
monitoring
Capacity
planning
Emergency
response
Change
management
Culture
● SLO
● Dashboard
● Analytics
● Forecasting
● Demand-driven
● Performance
● Release process
● Consulting design
● Automations
● Oncall
● Incident analysis
● Postmortems
● Toil management
● Blamelessness
● Share responsibility
What does SRE implement on Platform?
Metrics &
monitoring
Capacity
planning
Emergency
response
Change
management
Culture
● SLO
● Dashboard
● Analytics
● Forecasting
● Demand-driven
● Performance
● Release process
● Consulting design
● Automations
● Oncall
● Incident analysis
● Postmortems
● Toil management
● Blamelessness
● Share responsibility
Monitoring and Incident Management
Understand system
architecture
Understand system
architecture and
deployed topology
System monitoring
Monitoring system
by gathering
blackbox & whitebox
metrics
SLI & SLO are
extracted from the
matrix and logs.
The informations are
visualized thru
dashboard
Log handling
Managing planned
event (release,
maintenance)
Incident handling
Create incident
ticket
Rollback change to
resolve incident
Investigate root
cause with
logging,monitoring
matrix and
debugging.
Postmortem
Retrospect incident
and prepare plan to
prevent reoccurence
What to Monitor?
SLO = SLI + Target
“99% of REST API call will complete in less than 100ms every week”
SLI Target
SLI
service level
indicator: a
well-defined
measure of 'good
enough'
• used to specify
SLO/SLA
SLO
service level
objective: a top-line
target for fraction
of good
interactions
• specifies goals
(SLI + Target)
SLA
service level
agreement:
consequences
• SLA = (SLO + margin)
+ consequences = SLI
+ Target +
consequences
Error Budget
Product management &
SRE define an availability
target.
• 100% - availability target
is a “budget of
unreliability”
(or the error budget).
Availability
SLO
Allowed unavailability window Error Budget
per year per quarter per 30 days Error rate 1%
90% 36.5 days 9 days 3 days 90
95% 18.25 days 4.5 days 1.5 days 80
99% 3.65 days 21.6 hours 7.2 hours 0
99.5% 1.83 days 10.8 hours 3.6 hours -100
99.9% 8.76 hours 2.16 hours 43.2 minutes -900
99.95% 4.38 hours 1.08 hours 21.6 minutes -1900
99.99% 52.6 minutes 12.96 minutes 4.32 minutes -9900
99.999% 5.26 minutes 1.30 minutes 25.9 seconds -99900
Error Budget (Availability)
Demo with Anthos:
Monitoring+Incident Mgmt
● Topology
● SLO/SLI Metrics
● Blackbox/Whitebox
● Log Viewer
● Tracing/Tracing Report
Demo with Anthos:
Monitoring+Incident Mgmt
Topology Blackbox Whitebox
Demo with Anthos:
Monitoring+Incident Mgmt
Logging Tracing
Error Budget Burn Down Rate
Demo with Anthos:
Proactive Reduce Error Budget
● Alert Setting
● Canary Deployment
● Cross-Region Deployment
Clients
Kubernetes Cluster
Kubernetes Engine
Taiwan-1
Kubernetes Cluster
Kubernetes Engine
Singapore
Cloud Load
Balancing
10
90
● Alert Setting
● Canary Deployment
● Cross-Region Deployment
Clients
Kubernetes Cluster
Kubernetes Engine
Taiwan-1
Kubernetes Cluster
Kubernetes Engine
Singapore
Cloud Load
Balancing
50
50
Demo with Anthos:
Proactive Reduce Error Budget
What does SRE implement on Platform?
Metrics &
monitoring
Capacity
planning
Emergency
response
Change
management
Culture
● SLO
● Dashboard
● Analytics
● Forecasting
● Demand-driven
● Performance
● Release process
● Consulting design
● Automations
● Oncall
● Incident analysis
● Postmortems
● Toil management
● Blamelessness
● Share responsibility
Capacity planning
Plan for organic growth
Increased product adoption
and usage by customers.
Determine inorganic
growth
Sudden jumps in demand
due to feature launches,
marketing campaigns, etc.
Change Management
Roughly 70%1
of outages are due to changes in a live system
Kubernetes Configuration Service Continuous Deployment
Clients
Kubernetes Cluster
Kubernetes Engine
Multiple Instances
Cloud Source
Repositories
OnPremise
Kubernetes Cluster
Kubernetes Engine
GCP
Kubernetes Cluster
Kubernetes Engine
On-Prem1
Anthos Hub
Service
NAT
Demo with Anthos:
The Power of GitOps
Summary + Call for Action
● SRE has 3 key principles:
○ Decision Based on Data (有意義的監控)
○ Be User Centric(黑箱測試)
○ Blameless Culture & Share Responsibility (分擔責任,共同努力)
● Kubernetes is a perfect platform to implement SRE
○ SLI + SLO + Error Budget
○ Watch for the Budget Burn Rate
○ Establish CI+CD with GitOps
● Pick a System and Build your SRE Practices
Cover images used with permission. These books can be found on shop.oreilly.com.

More Related Content

What's hot

Value stream mapping for DevOps
Value stream mapping for DevOpsValue stream mapping for DevOps
Value stream mapping for DevOpsMarc Hornbeek
 
Adopting DevOps @ Scale: Lessons learned at Hertz, Kaiser Permanente and lBM
Adopting DevOps @ Scale: Lessons learned at Hertz, Kaiser Permanente and lBMAdopting DevOps @ Scale: Lessons learned at Hertz, Kaiser Permanente and lBM
Adopting DevOps @ Scale: Lessons learned at Hertz, Kaiser Permanente and lBMJules Pierre-Louis
 
Software life cycle ppt
Software life cycle pptSoftware life cycle ppt
Software life cycle pptArsalanAman
 
Cloud bees and forester open source is not enough
Cloud bees and forester open source is not enough  Cloud bees and forester open source is not enough
Cloud bees and forester open source is not enough Jules Pierre-Louis
 
Inspiring quality through devops
Inspiring quality through devopsInspiring quality through devops
Inspiring quality through devopsSanjeewa Alwis
 
Measure Twice, Cut Once: Using Team Operation Metrics to Optimize a Scaling S...
Measure Twice, Cut Once: Using Team Operation Metrics to Optimize a Scaling S...Measure Twice, Cut Once: Using Team Operation Metrics to Optimize a Scaling S...
Measure Twice, Cut Once: Using Team Operation Metrics to Optimize a Scaling S...VMware Tanzu
 
Innovate Everywhere: Choosing the Right Tools When Building Your SRE Toolchain
Innovate Everywhere: Choosing the Right Tools When Building Your SRE ToolchainInnovate Everywhere: Choosing the Right Tools When Building Your SRE Toolchain
Innovate Everywhere: Choosing the Right Tools When Building Your SRE ToolchainDevOps.com
 
The Evolving Role of the Developer in 2021
The Evolving Role of the Developer in 2021The Evolving Role of the Developer in 2021
The Evolving Role of the Developer in 2021DevOps.com
 
Presentation refactoring large legacy applications
Presentation refactoring large legacy applications Presentation refactoring large legacy applications
Presentation refactoring large legacy applications Jorge Capel Planells
 
Quality Testing and Agile at Salesforce
Quality Testing and Agile at Salesforce Quality Testing and Agile at Salesforce
Quality Testing and Agile at Salesforce Salesforce Engineering
 
SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...DevClub_lv
 
Rapid Strategic SRE Assessments
Rapid Strategic SRE AssessmentsRapid Strategic SRE Assessments
Rapid Strategic SRE AssessmentsMarc Hornbeek
 
Deploy Fast Without Breaking Things Webinar Presentation June 25
Deploy Fast Without Breaking Things Webinar Presentation June 25Deploy Fast Without Breaking Things Webinar Presentation June 25
Deploy Fast Without Breaking Things Webinar Presentation June 25Serena Software
 
What you should know about software measurement platforms
What you should know about software measurement platformsWhat you should know about software measurement platforms
What you should know about software measurement platformsCAST
 
TechRight Vol 1. Nis devops presentation
TechRight Vol 1. Nis devops presentationTechRight Vol 1. Nis devops presentation
TechRight Vol 1. Nis devops presentationPredragSimic7
 
Scrum Portugal Meeting 1 Lisbon - ALM
Scrum Portugal Meeting 1 Lisbon - ALMScrum Portugal Meeting 1 Lisbon - ALM
Scrum Portugal Meeting 1 Lisbon - ALMMarco Silva
 
Replace Outdated DevOps Tools with Innovative & Modern Pipelines
 Replace Outdated DevOps Tools with Innovative & Modern Pipelines Replace Outdated DevOps Tools with Innovative & Modern Pipelines
Replace Outdated DevOps Tools with Innovative & Modern PipelinesDevOps.com
 
How Developers and Quality Engineer Collaborate at Salesforce
How Developers and Quality Engineer Collaborate at SalesforceHow Developers and Quality Engineer Collaborate at Salesforce
How Developers and Quality Engineer Collaborate at SalesforceSalesforce Engineering
 
Tulika Gupta Resume
Tulika Gupta ResumeTulika Gupta Resume
Tulika Gupta ResumeTulika Gupta
 

What's hot (20)

Value stream mapping for DevOps
Value stream mapping for DevOpsValue stream mapping for DevOps
Value stream mapping for DevOps
 
Adopting DevOps @ Scale: Lessons learned at Hertz, Kaiser Permanente and lBM
Adopting DevOps @ Scale: Lessons learned at Hertz, Kaiser Permanente and lBMAdopting DevOps @ Scale: Lessons learned at Hertz, Kaiser Permanente and lBM
Adopting DevOps @ Scale: Lessons learned at Hertz, Kaiser Permanente and lBM
 
Software life cycle ppt
Software life cycle pptSoftware life cycle ppt
Software life cycle ppt
 
Cloud bees and forester open source is not enough
Cloud bees and forester open source is not enough  Cloud bees and forester open source is not enough
Cloud bees and forester open source is not enough
 
Inspiring quality through devops
Inspiring quality through devopsInspiring quality through devops
Inspiring quality through devops
 
Measure Twice, Cut Once: Using Team Operation Metrics to Optimize a Scaling S...
Measure Twice, Cut Once: Using Team Operation Metrics to Optimize a Scaling S...Measure Twice, Cut Once: Using Team Operation Metrics to Optimize a Scaling S...
Measure Twice, Cut Once: Using Team Operation Metrics to Optimize a Scaling S...
 
Innovate Everywhere: Choosing the Right Tools When Building Your SRE Toolchain
Innovate Everywhere: Choosing the Right Tools When Building Your SRE ToolchainInnovate Everywhere: Choosing the Right Tools When Building Your SRE Toolchain
Innovate Everywhere: Choosing the Right Tools When Building Your SRE Toolchain
 
The Evolving Role of the Developer in 2021
The Evolving Role of the Developer in 2021The Evolving Role of the Developer in 2021
The Evolving Role of the Developer in 2021
 
Presentation refactoring large legacy applications
Presentation refactoring large legacy applications Presentation refactoring large legacy applications
Presentation refactoring large legacy applications
 
Quality Testing and Agile at Salesforce
Quality Testing and Agile at Salesforce Quality Testing and Agile at Salesforce
Quality Testing and Agile at Salesforce
 
SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...
 
Rapid Strategic SRE Assessments
Rapid Strategic SRE AssessmentsRapid Strategic SRE Assessments
Rapid Strategic SRE Assessments
 
Deploy Fast Without Breaking Things Webinar Presentation June 25
Deploy Fast Without Breaking Things Webinar Presentation June 25Deploy Fast Without Breaking Things Webinar Presentation June 25
Deploy Fast Without Breaking Things Webinar Presentation June 25
 
What you should know about software measurement platforms
What you should know about software measurement platformsWhat you should know about software measurement platforms
What you should know about software measurement platforms
 
TechRight Vol 1. Nis devops presentation
TechRight Vol 1. Nis devops presentationTechRight Vol 1. Nis devops presentation
TechRight Vol 1. Nis devops presentation
 
Scrum Portugal Meeting 1 Lisbon - ALM
Scrum Portugal Meeting 1 Lisbon - ALMScrum Portugal Meeting 1 Lisbon - ALM
Scrum Portugal Meeting 1 Lisbon - ALM
 
Shift_Left
Shift_LeftShift_Left
Shift_Left
 
Replace Outdated DevOps Tools with Innovative & Modern Pipelines
 Replace Outdated DevOps Tools with Innovative & Modern Pipelines Replace Outdated DevOps Tools with Innovative & Modern Pipelines
Replace Outdated DevOps Tools with Innovative & Modern Pipelines
 
How Developers and Quality Engineer Collaborate at Salesforce
How Developers and Quality Engineer Collaborate at SalesforceHow Developers and Quality Engineer Collaborate at Salesforce
How Developers and Quality Engineer Collaborate at Salesforce
 
Tulika Gupta Resume
Tulika Gupta ResumeTulika Gupta Resume
Tulika Gupta Resume
 

Similar to How to use Istio/Anthos to build Enterprise SRE

Webinar - Devops platform for the evolving enterprise
Webinar - Devops platform for the evolving enterpriseWebinar - Devops platform for the evolving enterprise
Webinar - Devops platform for the evolving enterpriseDBmaestro - Database DevOps
 
Fllow con 2014
Fllow con 2014 Fllow con 2014
Fllow con 2014 gbgruver
 
Overcoming scalability issues in your prometheus ecosystem
Overcoming scalability issues in your prometheus ecosystemOvercoming scalability issues in your prometheus ecosystem
Overcoming scalability issues in your prometheus ecosystemNebulaworks
 
Overcoming (organizational) scalability issues in your Prometheus ecosystem
Overcoming (organizational) scalability issues in your Prometheus ecosystemOvercoming (organizational) scalability issues in your Prometheus ecosystem
Overcoming (organizational) scalability issues in your Prometheus ecosystemQAware GmbH
 
DevOps in Salesforce AppCloud
DevOps in Salesforce AppCloudDevOps in Salesforce AppCloud
DevOps in Salesforce AppCloudrsg00usa
 
Introduction to 5w’s of DevOps
Introduction to 5w’s of DevOpsIntroduction to 5w’s of DevOps
Introduction to 5w’s of DevOpsCygnet Infotech
 
SRE and GitOps for Building Robust Kubernetes Platforms.pdf
SRE and GitOps for Building Robust Kubernetes Platforms.pdfSRE and GitOps for Building Robust Kubernetes Platforms.pdf
SRE and GitOps for Building Robust Kubernetes Platforms.pdfWeaveworks
 
DevOps CD and Multispeed IT in regulated industries (FUG Presentation)
DevOps CD and Multispeed IT in regulated industries (FUG Presentation)DevOps CD and Multispeed IT in regulated industries (FUG Presentation)
DevOps CD and Multispeed IT in regulated industries (FUG Presentation)Serena Software
 
[webinar] Secrets of Top-performing DevOps Teams -- at Google and Beyond
[webinar] Secrets of Top-performing DevOps Teams -- at Google and Beyond[webinar] Secrets of Top-performing DevOps Teams -- at Google and Beyond
[webinar] Secrets of Top-performing DevOps Teams -- at Google and BeyondApplitools
 
HPE Agile Manager and ALM Overview
HPE Agile Manager and ALM OverviewHPE Agile Manager and ALM Overview
HPE Agile Manager and ALM OverviewJeffrey Nunn
 
Embracing service-level-objectives of your microservices in your Cl/CD
Embracing service-level-objectives of your microservices in your Cl/CDEmbracing service-level-objectives of your microservices in your Cl/CD
Embracing service-level-objectives of your microservices in your Cl/CDNebulaworks
 
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...SlideTeam
 
Business Case Calculator for DevOps Initiatives - Leading credit card service...
Business Case Calculator for DevOps Initiatives - Leading credit card service...Business Case Calculator for DevOps Initiatives - Leading credit card service...
Business Case Calculator for DevOps Initiatives - Leading credit card service...Capgemini
 
S.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systemsS.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systemsRicardo Amaro
 
Daniel Breston - DevOps metrics that matter
Daniel Breston - DevOps metrics that matterDaniel Breston - DevOps metrics that matter
Daniel Breston - DevOps metrics that matteritSMF UK
 
DevOps Roadshow - removing barriers between development and operations
DevOps Roadshow - removing barriers between development and operationsDevOps Roadshow - removing barriers between development and operations
DevOps Roadshow - removing barriers between development and operationsMicrosoft Developer Norway
 

Similar to How to use Istio/Anthos to build Enterprise SRE (20)

Webinar - Devops platform for the evolving enterprise
Webinar - Devops platform for the evolving enterpriseWebinar - Devops platform for the evolving enterprise
Webinar - Devops platform for the evolving enterprise
 
Fllow con 2014
Fllow con 2014 Fllow con 2014
Fllow con 2014
 
Overcoming scalability issues in your prometheus ecosystem
Overcoming scalability issues in your prometheus ecosystemOvercoming scalability issues in your prometheus ecosystem
Overcoming scalability issues in your prometheus ecosystem
 
Overcoming (organizational) scalability issues in your Prometheus ecosystem
Overcoming (organizational) scalability issues in your Prometheus ecosystemOvercoming (organizational) scalability issues in your Prometheus ecosystem
Overcoming (organizational) scalability issues in your Prometheus ecosystem
 
DevOps in Salesforce AppCloud
DevOps in Salesforce AppCloudDevOps in Salesforce AppCloud
DevOps in Salesforce AppCloud
 
Introduction to 5w’s of DevOps
Introduction to 5w’s of DevOpsIntroduction to 5w’s of DevOps
Introduction to 5w’s of DevOps
 
SRE and GitOps for Building Robust Kubernetes Platforms.pdf
SRE and GitOps for Building Robust Kubernetes Platforms.pdfSRE and GitOps for Building Robust Kubernetes Platforms.pdf
SRE and GitOps for Building Robust Kubernetes Platforms.pdf
 
DevOps CD and Multispeed IT in regulated industries (FUG Presentation)
DevOps CD and Multispeed IT in regulated industries (FUG Presentation)DevOps CD and Multispeed IT in regulated industries (FUG Presentation)
DevOps CD and Multispeed IT in regulated industries (FUG Presentation)
 
[webinar] Secrets of Top-performing DevOps Teams -- at Google and Beyond
[webinar] Secrets of Top-performing DevOps Teams -- at Google and Beyond[webinar] Secrets of Top-performing DevOps Teams -- at Google and Beyond
[webinar] Secrets of Top-performing DevOps Teams -- at Google and Beyond
 
Coding in the App Cloud
Coding in the App CloudCoding in the App Cloud
Coding in the App Cloud
 
HPE Agile Manager and ALM Overview
HPE Agile Manager and ALM OverviewHPE Agile Manager and ALM Overview
HPE Agile Manager and ALM Overview
 
Agile at scale
Agile at scaleAgile at scale
Agile at scale
 
Embracing service-level-objectives of your microservices in your Cl/CD
Embracing service-level-objectives of your microservices in your Cl/CDEmbracing service-level-objectives of your microservices in your Cl/CD
Embracing service-level-objectives of your microservices in your Cl/CD
 
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...
 
DevOps
DevOpsDevOps
DevOps
 
Business Case Calculator for DevOps Initiatives - Leading credit card service...
Business Case Calculator for DevOps Initiatives - Leading credit card service...Business Case Calculator for DevOps Initiatives - Leading credit card service...
Business Case Calculator for DevOps Initiatives - Leading credit card service...
 
Q!Digitz
Q!Digitz Q!Digitz
Q!Digitz
 
S.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systemsS.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systems
 
Daniel Breston - DevOps metrics that matter
Daniel Breston - DevOps metrics that matterDaniel Breston - DevOps metrics that matter
Daniel Breston - DevOps metrics that matter
 
DevOps Roadshow - removing barriers between development and operations
DevOps Roadshow - removing barriers between development and operationsDevOps Roadshow - removing barriers between development and operations
DevOps Roadshow - removing barriers between development and operations
 

Recently uploaded

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 

Recently uploaded (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

How to use Istio/Anthos to build Enterprise SRE

  • 1. 透過 Istio 打造企業內的 SRE Hybrid Specialist: Shawn Ho shawnho@google.com
  • 3. Product Lifecycle Concept Business Development Operations Market Agile solves this DevOps solves this
  • 5. What is relationship between Devops and SRE ? ● Devops is more like abstract concept,guide line and disciplines to break silos in developments, operation ● SRE is Google version of realized practice of Devops. “Class SRE implements Devops”
  • 7. #1. Decision based on data 所有的決定是以資料為基礎
  • 8. #2. Be user centric 即使所有的監控數據都是正常的, 但客戶只要覺得系統不穩定,那系統就是不穩定
  • 9. #3. Blameless culture & Share responsibility 降低部門隔閡要由跨部門的責任分享開始 (Developers, Operators, Leader) 系統 系統失效不僅是維運者的責任,程式碼品質,技術債等都是可能的原因
  • 10. 2How to Implement SRE by Istio/Anthos?
  • 11. Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin Monitoring plugin HTTP, gRPC, TCP Routing + Secure Naming CertAuthority plugin Ingress Gateway Egress Gateway mTLSmTLS mTLS JWT + TLS Cert issuance Perimeter security policies Perimeter security policies Istio Control Plane Pilot Policy Enforcement + Reporting Data flow Control + metrics flow Local Authz JWT + TLS Internal App 1 External App 1
  • 12. What does SRE implement on Platform? Metrics & monitoring Capacity planning Emergency response Change management Culture ● SLO ● Dashboard ● Analytics ● Forecasting ● Demand-driven ● Performance ● Release process ● Consulting design ● Automations ● Oncall ● Incident analysis ● Postmortems ● Toil management ● Blamelessness ● Share responsibility
  • 13. What does SRE implement on Platform? Metrics & monitoring Capacity planning Emergency response Change management Culture ● SLO ● Dashboard ● Analytics ● Forecasting ● Demand-driven ● Performance ● Release process ● Consulting design ● Automations ● Oncall ● Incident analysis ● Postmortems ● Toil management ● Blamelessness ● Share responsibility
  • 14. Monitoring and Incident Management Understand system architecture Understand system architecture and deployed topology System monitoring Monitoring system by gathering blackbox & whitebox metrics SLI & SLO are extracted from the matrix and logs. The informations are visualized thru dashboard Log handling Managing planned event (release, maintenance) Incident handling Create incident ticket Rollback change to resolve incident Investigate root cause with logging,monitoring matrix and debugging. Postmortem Retrospect incident and prepare plan to prevent reoccurence
  • 15. What to Monitor? SLO = SLI + Target “99% of REST API call will complete in less than 100ms every week” SLI Target SLI service level indicator: a well-defined measure of 'good enough' • used to specify SLO/SLA SLO service level objective: a top-line target for fraction of good interactions • specifies goals (SLI + Target) SLA service level agreement: consequences • SLA = (SLO + margin) + consequences = SLI + Target + consequences Error Budget Product management & SRE define an availability target. • 100% - availability target is a “budget of unreliability” (or the error budget).
  • 16. Availability SLO Allowed unavailability window Error Budget per year per quarter per 30 days Error rate 1% 90% 36.5 days 9 days 3 days 90 95% 18.25 days 4.5 days 1.5 days 80 99% 3.65 days 21.6 hours 7.2 hours 0 99.5% 1.83 days 10.8 hours 3.6 hours -100 99.9% 8.76 hours 2.16 hours 43.2 minutes -900 99.95% 4.38 hours 1.08 hours 21.6 minutes -1900 99.99% 52.6 minutes 12.96 minutes 4.32 minutes -9900 99.999% 5.26 minutes 1.30 minutes 25.9 seconds -99900 Error Budget (Availability)
  • 17. Demo with Anthos: Monitoring+Incident Mgmt ● Topology ● SLO/SLI Metrics ● Blackbox/Whitebox ● Log Viewer ● Tracing/Tracing Report
  • 18. Demo with Anthos: Monitoring+Incident Mgmt Topology Blackbox Whitebox
  • 20. Error Budget Burn Down Rate
  • 21. Demo with Anthos: Proactive Reduce Error Budget ● Alert Setting ● Canary Deployment ● Cross-Region Deployment Clients Kubernetes Cluster Kubernetes Engine Taiwan-1 Kubernetes Cluster Kubernetes Engine Singapore Cloud Load Balancing 10 90
  • 22. ● Alert Setting ● Canary Deployment ● Cross-Region Deployment Clients Kubernetes Cluster Kubernetes Engine Taiwan-1 Kubernetes Cluster Kubernetes Engine Singapore Cloud Load Balancing 50 50 Demo with Anthos: Proactive Reduce Error Budget
  • 23. What does SRE implement on Platform? Metrics & monitoring Capacity planning Emergency response Change management Culture ● SLO ● Dashboard ● Analytics ● Forecasting ● Demand-driven ● Performance ● Release process ● Consulting design ● Automations ● Oncall ● Incident analysis ● Postmortems ● Toil management ● Blamelessness ● Share responsibility
  • 24. Capacity planning Plan for organic growth Increased product adoption and usage by customers. Determine inorganic growth Sudden jumps in demand due to feature launches, marketing campaigns, etc.
  • 25. Change Management Roughly 70%1 of outages are due to changes in a live system Kubernetes Configuration Service Continuous Deployment Clients Kubernetes Cluster Kubernetes Engine Multiple Instances Cloud Source Repositories OnPremise Kubernetes Cluster Kubernetes Engine GCP Kubernetes Cluster Kubernetes Engine On-Prem1 Anthos Hub Service NAT
  • 26. Demo with Anthos: The Power of GitOps
  • 27. Summary + Call for Action ● SRE has 3 key principles: ○ Decision Based on Data (有意義的監控) ○ Be User Centric(黑箱測試) ○ Blameless Culture & Share Responsibility (分擔責任,共同努力) ● Kubernetes is a perfect platform to implement SRE ○ SLI + SLO + Error Budget ○ Watch for the Budget Burn Rate ○ Establish CI+CD with GitOps ● Pick a System and Build your SRE Practices
  • 28. Cover images used with permission. These books can be found on shop.oreilly.com.