SlideShare a Scribd company logo
Aiming for four nines with Apica
Synthetic
Anders Iderström
Engineering Manager
Live Operations & Observability
• Unique offering
• Fast growth
=
• Speed is king
• Money is not an issue
About Klarna
• What we provide
• What we rely on
Live Operations
We provide fast and consistent professional incident management.
We act as an internal communications bridge and provide support on monitoring
and log analysis.
We identify and share knowledge about risks in an ever changing environment
and usage scenario.
We aim to be the source of truth and provide IT operations expertise.
Live Operations
Our everyday tools:
Apica WPM, op5 Monitor, built-in alerting mechanisms, Splunk,
Watchdog/opstat (Pentaho/HighCharts), Monks, Graphite, Grafana, Jarmon,
Dashing.
Also used at Klarna:
New Relic, Kermit, OpsGenie, Oracle-related, finance IT related
= if we don’t have it, we’re probably in the process of getting or writing or buying
it. 
Live Operations
• IT-operations Monitoring
• Panoptic development team
• Personal experience from monitoring
implementation in about 150 organisations
• My development, training and coach
• A hundred people in ITops
• A couple of hundred other engineers
(developers)
My other powers
Mid 2012 through late 2013
2014 and 2015 - way better
Monthly KRED (KPM) availability
2014 and 2015
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2014 2015
99.8%
99.9%
99.95%
Availability
(Uptime/month)
Downtime
Minutes/month
2014 average
11 min down/month
22 sec down/day
99.975% availability
2015 average
11 min down/month
22 sec down/day
99.975% availability
0
22
44
66
88
110
132
154
176
198
220
Yes, averages are identical for 2014 and 2015
99.975%
100.00%
Q4 2015 at average 99.997% for full quarter
Q4 2015 KCO availability
KCO RedFlow Sweden
Oct Nov Dec
2015
99.8%
99.9%
99.95%
Availability
(Uptime/month)
Downtime
Minutes/month
Q4 2015 average
RedFlow SE
4.7 min down/month
9 sec down/day
99.99 % availability (full quarter)
100% availability in Nov and Dec
99.975%
100.00%0
22
44
66
88
110
132
154
176
198
220
• How did we reach 99,975?
• We integrated Apica WPM API with Opstat/Seqailizer/Petahoo (wallboarding
last 2 minutes / last 2 hours, number of timeouts, avail-numbers, ect)
• And ”Defined downtime”: ext/int: x number of timeouts of y number of
minutes AND/OR other checks we can include, like Klarna Online or
backoffice being unavailable
2014 and 2015 - continued
Implementing end-to-end
responsibility in Product
Development
• Measurements become more sensitve too, so you need to remember when
your providers providers failed, or make a NOTE of it and exclude that from
your measurements 
• Added to a new feature
• Requested at Apica Kundforum last year
• Implemented, thanks Apica!
Polishing the truth (feature req)
from advanced integration and simple check, to advanced checks and no integration
- It takes some time to get right
- It requires tons of approvals in modern fintech systems
+ It’s quick and simple to consume
+ It’s fairly “new” (new type of checks, or at least newly completed for all big product
versions/regions)
+ We need a rolling 12 month view now, as a complement to hour-to-hour
wallboards
Shifting focus
The 8 step checks and how we see which one has failed:
Zebratester (a.k.a Proxysniffer):
Merchant Create Order (merchant_create_order_v2)
Merchant Read Order (merchant_read_order_v2)
Client Read Order (client_read_pre_purchase_order_v1)
Client Update Order - Sending challenge information
(client_update_pre_purchase_order_v1)
Client Update Order - Sending billing
information (client_update_pre_purchase_order_v1)
Select Payment Method (client_update_pre_purchase_order_v1)
PGW loading iframe - WebPage (client_update_pre_purchase_order_v1)
PGW loading iframe - API details (client_update_pre_purchase_order_v1)
Shifting focus
Providing an additional 6-7 minutes uptime per month (99,975->99,990%)
Communication platforms starts to become cost efficient
Company goal: 99,97
IT-operations OKRs: Q1: 99,97, Q2: 99,98, Q3/Q4: 99,99
2016 company availability goal and
2016 ITops OKRs
And the progress? (2016)
High pace
= we quickly forget
• Three decimals needed at reporting/pdf-e-mail-report level for proper
precision in reporting numbers later combined with other figures to track top-
level company goals
• 10 decimals available in the API, but we’re not integrating at the moment 
• Needed
• Requested
• Implemented (actually with four decimals) (thanks Apica!) 
Actually seeing the last decimal -
from suggestion to practical use
(feature req)
- Gradually remove dependencies
- Let the experts handle stuff you heavily depend
on and/or create specialized teams (networking,
message bus, certificates)
- Never repeat mistakes
- Implement modern architecture
- Serve your existing customers from your new
shiny platform (LOCO)
Reaching above 99.975%
• Implement end-to-end responsibility
• Get started on a shift in architecture
• microservices in cloud platform
• graceful degradation
• but do it piece by piece (breaking pieces of the monolith)
• Keep centralized Incident Management (operations land, OPs
knowledge)
• Do continuous improvement/feedback - on all levels (liveops, dev-teams,
retros, incident reports and planned actions)
• Save minutes/seconds in communications
• Actually service all your customers from the new platform (solve the
tech dept through massive added value)
TAKEAWAYS
Anders Iderström
Thank you!

More Related Content

What's hot

Webinar Slides: How Samsung ARTIK Serves Global IoT Customers in the Cloud
Webinar Slides: How Samsung ARTIK Serves Global IoT Customers in the CloudWebinar Slides: How Samsung ARTIK Serves Global IoT Customers in the Cloud
Webinar Slides: How Samsung ARTIK Serves Global IoT Customers in the Cloud
Continuent
 
888 IT Operations Management with Nolio
888 IT Operations Management with Nolio888 IT Operations Management with Nolio
888 IT Operations Management with Nolio
Nolio
 
Smart (IoT) DevOps solution
Smart (IoT) DevOps solutionSmart (IoT) DevOps solution
Smart (IoT) DevOps solution
Pritesh Gandhi
 
DENOVOLAB Class 4 Fusion (All-in-one) Hosted and Dedicated Switches
DENOVOLAB Class 4 Fusion (All-in-one) Hosted and Dedicated SwitchesDENOVOLAB Class 4 Fusion (All-in-one) Hosted and Dedicated Switches
DENOVOLAB Class 4 Fusion (All-in-one) Hosted and Dedicated Switches
Denovolab Switch
 
Henk Boxma (Boxma IT) at Industry Leaders Forum 2015
Henk Boxma (Boxma IT) at Industry Leaders Forum 2015Henk Boxma (Boxma IT) at Industry Leaders Forum 2015
Henk Boxma (Boxma IT) at Industry Leaders Forum 2015
TAUS - The Language Data Network
 
Big data analytics for field jobs
Big data analytics for field jobsBig data analytics for field jobs
Big data analytics for field jobs
Gurchran Singh
 
Accpac Live - Meet the Exhibitors
Accpac Live - Meet the ExhibitorsAccpac Live - Meet the Exhibitors
Accpac Live - Meet the Exhibitors
BAASS Business Solutions Inc.
 
New relic in action at trainline
New relic in action at trainlineNew relic in action at trainline
New relic in action at trainline
trainline Engineering
 
New Relic - May 2015 Meetup @ thetrainline
New Relic - May 2015 Meetup @ thetrainlineNew Relic - May 2015 Meetup @ thetrainline
New Relic - May 2015 Meetup @ thetrainline
trainline Engineering
 
Concur Automated Travel & Expense Management
Concur Automated Travel &  Expense ManagementConcur Automated Travel &  Expense Management
Concur Automated Travel & Expense Management
Net at Work
 
Tips on Moving from Sage 300 Financial Reporter to Sage Intelligence
Tips on Moving from Sage 300 Financial Reporter to Sage IntelligenceTips on Moving from Sage 300 Financial Reporter to Sage Intelligence
Tips on Moving from Sage 300 Financial Reporter to Sage Intelligence
Net at Work
 
Sage 100 User Group
Sage 100 User GroupSage 100 User Group
Sage 100 User Group
Net at Work
 
Understanding Your Setup Options for Accounts Payable
Understanding Your Setup Options for Accounts PayableUnderstanding Your Setup Options for Accounts Payable
Understanding Your Setup Options for Accounts Payable
Net at Work
 
SmartWorld Mobile & Web Apps Overview
SmartWorld Mobile & Web Apps OverviewSmartWorld Mobile & Web Apps Overview
SmartWorld Mobile & Web Apps Overview
Matthew Gawn
 
Testing SAP PI/PO systems Full version
Testing SAP PI/PO systems Full versionTesting SAP PI/PO systems Full version
Testing SAP PI/PO systems Full version
Daniel Graversen
 
Platform-as-a-Service for Automated Business Autocomes - Cap Gemini
Platform-as-a-Service for Automated Business Autocomes - Cap GeminiPlatform-as-a-Service for Automated Business Autocomes - Cap Gemini
Platform-as-a-Service for Automated Business Autocomes - Cap Gemini
CA | Automic Software
 
Spending Money to Save Money... and Getting Your Boss to Sign Off On It
Spending Money to Save Money... and Getting Your Boss to Sign Off On ItSpending Money to Save Money... and Getting Your Boss to Sign Off On It
Spending Money to Save Money... and Getting Your Boss to Sign Off On It
G2 Tech Group
 
Modern HVAC Report Writing
Modern HVAC Report WritingModern HVAC Report Writing
Modern HVAC Report Writing
Jean-Francois Arseneault
 
AppSphere 15 - Expedia Lessons from the Trenches: Managing AppDynamics at Scale
AppSphere 15 - Expedia Lessons from the Trenches: Managing AppDynamics at ScaleAppSphere 15 - Expedia Lessons from the Trenches: Managing AppDynamics at Scale
AppSphere 15 - Expedia Lessons from the Trenches: Managing AppDynamics at Scale
AppDynamics
 

What's hot (19)

Webinar Slides: How Samsung ARTIK Serves Global IoT Customers in the Cloud
Webinar Slides: How Samsung ARTIK Serves Global IoT Customers in the CloudWebinar Slides: How Samsung ARTIK Serves Global IoT Customers in the Cloud
Webinar Slides: How Samsung ARTIK Serves Global IoT Customers in the Cloud
 
888 IT Operations Management with Nolio
888 IT Operations Management with Nolio888 IT Operations Management with Nolio
888 IT Operations Management with Nolio
 
Smart (IoT) DevOps solution
Smart (IoT) DevOps solutionSmart (IoT) DevOps solution
Smart (IoT) DevOps solution
 
DENOVOLAB Class 4 Fusion (All-in-one) Hosted and Dedicated Switches
DENOVOLAB Class 4 Fusion (All-in-one) Hosted and Dedicated SwitchesDENOVOLAB Class 4 Fusion (All-in-one) Hosted and Dedicated Switches
DENOVOLAB Class 4 Fusion (All-in-one) Hosted and Dedicated Switches
 
Henk Boxma (Boxma IT) at Industry Leaders Forum 2015
Henk Boxma (Boxma IT) at Industry Leaders Forum 2015Henk Boxma (Boxma IT) at Industry Leaders Forum 2015
Henk Boxma (Boxma IT) at Industry Leaders Forum 2015
 
Big data analytics for field jobs
Big data analytics for field jobsBig data analytics for field jobs
Big data analytics for field jobs
 
Accpac Live - Meet the Exhibitors
Accpac Live - Meet the ExhibitorsAccpac Live - Meet the Exhibitors
Accpac Live - Meet the Exhibitors
 
New relic in action at trainline
New relic in action at trainlineNew relic in action at trainline
New relic in action at trainline
 
New Relic - May 2015 Meetup @ thetrainline
New Relic - May 2015 Meetup @ thetrainlineNew Relic - May 2015 Meetup @ thetrainline
New Relic - May 2015 Meetup @ thetrainline
 
Concur Automated Travel & Expense Management
Concur Automated Travel &  Expense ManagementConcur Automated Travel &  Expense Management
Concur Automated Travel & Expense Management
 
Tips on Moving from Sage 300 Financial Reporter to Sage Intelligence
Tips on Moving from Sage 300 Financial Reporter to Sage IntelligenceTips on Moving from Sage 300 Financial Reporter to Sage Intelligence
Tips on Moving from Sage 300 Financial Reporter to Sage Intelligence
 
Sage 100 User Group
Sage 100 User GroupSage 100 User Group
Sage 100 User Group
 
Understanding Your Setup Options for Accounts Payable
Understanding Your Setup Options for Accounts PayableUnderstanding Your Setup Options for Accounts Payable
Understanding Your Setup Options for Accounts Payable
 
SmartWorld Mobile & Web Apps Overview
SmartWorld Mobile & Web Apps OverviewSmartWorld Mobile & Web Apps Overview
SmartWorld Mobile & Web Apps Overview
 
Testing SAP PI/PO systems Full version
Testing SAP PI/PO systems Full versionTesting SAP PI/PO systems Full version
Testing SAP PI/PO systems Full version
 
Platform-as-a-Service for Automated Business Autocomes - Cap Gemini
Platform-as-a-Service for Automated Business Autocomes - Cap GeminiPlatform-as-a-Service for Automated Business Autocomes - Cap Gemini
Platform-as-a-Service for Automated Business Autocomes - Cap Gemini
 
Spending Money to Save Money... and Getting Your Boss to Sign Off On It
Spending Money to Save Money... and Getting Your Boss to Sign Off On ItSpending Money to Save Money... and Getting Your Boss to Sign Off On It
Spending Money to Save Money... and Getting Your Boss to Sign Off On It
 
Modern HVAC Report Writing
Modern HVAC Report WritingModern HVAC Report Writing
Modern HVAC Report Writing
 
AppSphere 15 - Expedia Lessons from the Trenches: Managing AppDynamics at Scale
AppSphere 15 - Expedia Lessons from the Trenches: Managing AppDynamics at ScaleAppSphere 15 - Expedia Lessons from the Trenches: Managing AppDynamics at Scale
AppSphere 15 - Expedia Lessons from the Trenches: Managing AppDynamics at Scale
 

Viewers also liked

High Availability - How to get 99.99% service availabilty - Designing cluster...
High Availability - How to get 99.99% service availabilty - Designing cluster...High Availability - How to get 99.99% service availabilty - Designing cluster...
High Availability - How to get 99.99% service availabilty - Designing cluster...
Barcamp Saigon
 
Klarna background
Klarna backgroundKlarna background
Klarna background
James Hurley
 
Klarna / Rethink Office / Aktivtetsbaserat
Klarna / Rethink Office / AktivtetsbaseratKlarna / Rethink Office / Aktivtetsbaserat
Klarna / Rethink Office / Aktivtetsbaserat
Summit & Friends
 
Klarna presentation Magentodagen, Anton Ijäs
Klarna presentation Magentodagen, Anton IjäsKlarna presentation Magentodagen, Anton Ijäs
Klarna presentation Magentodagen, Anton Ijäs
Petter Isaksson
 
Klarna - NOAH14 London
Klarna - NOAH14 LondonKlarna - NOAH14 London
Klarna - NOAH14 London
NOAH Advisors
 
Klarna - NOAH12 London
Klarna - NOAH12 LondonKlarna - NOAH12 London
Klarna - NOAH12 London
NOAH Advisors
 
SplunkLive! Stockholm 2015 - Klarna
SplunkLive! Stockholm 2015 - KlarnaSplunkLive! Stockholm 2015 - Klarna
SplunkLive! Stockholm 2015 - Klarna
Splunk
 
Catálogo Moda Club Eclipse 2014
Catálogo Moda Club Eclipse 2014Catálogo Moda Club Eclipse 2014
Catálogo Moda Club Eclipse 2014
Ph Multimarcas
 
Keeping Your Trust Private!
Keeping Your Trust Private!Keeping Your Trust Private!
Keeping Your Trust Private!
gemerich
 
Ciencianueva13
Ciencianueva13Ciencianueva13
Ciencianueva13
Diego Ferraro
 
Dicionário
DicionárioDicionário
Dicionário
Alex Cambréa
 
Das Ende der klassischen "B2B-Kommunikation" im Zeitalter von Social Media - ...
Das Ende der klassischen "B2B-Kommunikation" im Zeitalter von Social Media - ...Das Ende der klassischen "B2B-Kommunikation" im Zeitalter von Social Media - ...
Das Ende der klassischen "B2B-Kommunikation" im Zeitalter von Social Media - ...
Bjoern Eichstaedt
 
Circuitos impresos Ultra Flexibles (UFPC)
Circuitos impresos Ultra Flexibles (UFPC) Circuitos impresos Ultra Flexibles (UFPC)
Circuitos impresos Ultra Flexibles (UFPC)
David Valuex
 
Networking How-To Guide Handout
Networking How-To Guide HandoutNetworking How-To Guide Handout
Networking How-To Guide Handout
CirclesInitiativeDBQ
 
Reporte [1]
Reporte [1]Reporte [1]
Reporte [1]
Shiara MG
 
SAP Portal and Second Level Authentication Implementation for ESS Paystub
SAP Portal and Second Level Authentication Implementation for ESS PaystubSAP Portal and Second Level Authentication Implementation for ESS Paystub
SAP Portal and Second Level Authentication Implementation for ESS Paystub
Markus Van Kempen
 
DHL Taste. Simply Delivered. kokbok
DHL Taste. Simply Delivered. kokbokDHL Taste. Simply Delivered. kokbok
DHL Taste. Simply Delivered. kokbokSara Arrhenius
 
E portafolio freddy-munoz_gestión_empresarial
E portafolio freddy-munoz_gestión_empresarialE portafolio freddy-munoz_gestión_empresarial
E portafolio freddy-munoz_gestión_empresarial
freddygmunoz
 
Analisis de psicologias[1]
Analisis de psicologias[1]Analisis de psicologias[1]
Analisis de psicologias[1]
Joan Pech
 
50+ auf dem Arbeitsmarkt
50+ auf dem Arbeitsmarkt50+ auf dem Arbeitsmarkt
50+ auf dem Arbeitsmarkt
Urs Müller
 

Viewers also liked (20)

High Availability - How to get 99.99% service availabilty - Designing cluster...
High Availability - How to get 99.99% service availabilty - Designing cluster...High Availability - How to get 99.99% service availabilty - Designing cluster...
High Availability - How to get 99.99% service availabilty - Designing cluster...
 
Klarna background
Klarna backgroundKlarna background
Klarna background
 
Klarna / Rethink Office / Aktivtetsbaserat
Klarna / Rethink Office / AktivtetsbaseratKlarna / Rethink Office / Aktivtetsbaserat
Klarna / Rethink Office / Aktivtetsbaserat
 
Klarna presentation Magentodagen, Anton Ijäs
Klarna presentation Magentodagen, Anton IjäsKlarna presentation Magentodagen, Anton Ijäs
Klarna presentation Magentodagen, Anton Ijäs
 
Klarna - NOAH14 London
Klarna - NOAH14 LondonKlarna - NOAH14 London
Klarna - NOAH14 London
 
Klarna - NOAH12 London
Klarna - NOAH12 LondonKlarna - NOAH12 London
Klarna - NOAH12 London
 
SplunkLive! Stockholm 2015 - Klarna
SplunkLive! Stockholm 2015 - KlarnaSplunkLive! Stockholm 2015 - Klarna
SplunkLive! Stockholm 2015 - Klarna
 
Catálogo Moda Club Eclipse 2014
Catálogo Moda Club Eclipse 2014Catálogo Moda Club Eclipse 2014
Catálogo Moda Club Eclipse 2014
 
Keeping Your Trust Private!
Keeping Your Trust Private!Keeping Your Trust Private!
Keeping Your Trust Private!
 
Ciencianueva13
Ciencianueva13Ciencianueva13
Ciencianueva13
 
Dicionário
DicionárioDicionário
Dicionário
 
Das Ende der klassischen "B2B-Kommunikation" im Zeitalter von Social Media - ...
Das Ende der klassischen "B2B-Kommunikation" im Zeitalter von Social Media - ...Das Ende der klassischen "B2B-Kommunikation" im Zeitalter von Social Media - ...
Das Ende der klassischen "B2B-Kommunikation" im Zeitalter von Social Media - ...
 
Circuitos impresos Ultra Flexibles (UFPC)
Circuitos impresos Ultra Flexibles (UFPC) Circuitos impresos Ultra Flexibles (UFPC)
Circuitos impresos Ultra Flexibles (UFPC)
 
Networking How-To Guide Handout
Networking How-To Guide HandoutNetworking How-To Guide Handout
Networking How-To Guide Handout
 
Reporte [1]
Reporte [1]Reporte [1]
Reporte [1]
 
SAP Portal and Second Level Authentication Implementation for ESS Paystub
SAP Portal and Second Level Authentication Implementation for ESS PaystubSAP Portal and Second Level Authentication Implementation for ESS Paystub
SAP Portal and Second Level Authentication Implementation for ESS Paystub
 
DHL Taste. Simply Delivered. kokbok
DHL Taste. Simply Delivered. kokbokDHL Taste. Simply Delivered. kokbok
DHL Taste. Simply Delivered. kokbok
 
E portafolio freddy-munoz_gestión_empresarial
E portafolio freddy-munoz_gestión_empresarialE portafolio freddy-munoz_gestión_empresarial
E portafolio freddy-munoz_gestión_empresarial
 
Analisis de psicologias[1]
Analisis de psicologias[1]Analisis de psicologias[1]
Analisis de psicologias[1]
 
50+ auf dem Arbeitsmarkt
50+ auf dem Arbeitsmarkt50+ auf dem Arbeitsmarkt
50+ auf dem Arbeitsmarkt
 

Similar to How to Reach 99.99% Uptime

Capgemini Significantly Improves Performance and Reporting with Oracle Exadata
Capgemini Significantly Improves Performance and Reporting with Oracle ExadataCapgemini Significantly Improves Performance and Reporting with Oracle Exadata
Capgemini Significantly Improves Performance and Reporting with Oracle Exadata
Capgemini
 
Powering Business Transformation with Oracle Exadata: a Capgemini Case Study
Powering Business Transformation with Oracle Exadata: a Capgemini Case StudyPowering Business Transformation with Oracle Exadata: a Capgemini Case Study
Powering Business Transformation with Oracle Exadata: a Capgemini Case Study
Capgemini
 
SAP’s Intelligent Enterprise Strategy
SAP’s Intelligent Enterprise StrategySAP’s Intelligent Enterprise Strategy
SAP’s Intelligent Enterprise Strategy
AGSanePLDTCompany
 
AppSphere 15 - Achieving Enterprise Agility, Superior Scalability and Deploym...
AppSphere 15 - Achieving Enterprise Agility, Superior Scalability and Deploym...AppSphere 15 - Achieving Enterprise Agility, Superior Scalability and Deploym...
AppSphere 15 - Achieving Enterprise Agility, Superior Scalability and Deploym...
AppDynamics
 
Introduction to 4castplus - Owners
Introduction to 4castplus - OwnersIntroduction to 4castplus - Owners
Introduction to 4castplus - Owners
4castplus
 
Cfp guidance template_kpc_hr_cockpit
Cfp guidance template_kpc_hr_cockpitCfp guidance template_kpc_hr_cockpit
Cfp guidance template_kpc_hr_cockpit
Loic Rakotoarivony
 
E-business R12 Flow Process for P2P.pptx
E-business R12 Flow Process for P2P.pptxE-business R12 Flow Process for P2P.pptx
E-business R12 Flow Process for P2P.pptx
PrathapChandrappa1
 
AU 2014: Autodesk PLM 360 Success Story with Inphi (PPT)
AU 2014: Autodesk PLM 360 Success Story with Inphi (PPT)AU 2014: Autodesk PLM 360 Success Story with Inphi (PPT)
AU 2014: Autodesk PLM 360 Success Story with Inphi (PPT)
Razorleaf Corporation
 
EIE Sales Force Automation
EIE Sales Force AutomationEIE Sales Force Automation
EIE Sales Force Automation
endlessimaginationexperts
 
Gaining efficiency at Barco with SAP Vendor Invoice Management
Gaining efficiency at Barco with SAP Vendor Invoice Management Gaining efficiency at Barco with SAP Vendor Invoice Management
Gaining efficiency at Barco with SAP Vendor Invoice Management
delaware BeLux
 
CabbageSoft.pptx
CabbageSoft.pptxCabbageSoft.pptx
CabbageSoft.pptx
GauravShirodkar6
 
The Business Justification for APM
The Business Justification for APMThe Business Justification for APM
The Business Justification for APM
Jonah Kowall
 
IPsoft Briefing Workshop: Oil & Gas Summit
IPsoft Briefing Workshop: Oil & Gas SummitIPsoft Briefing Workshop: Oil & Gas Summit
IPsoft Briefing Workshop: Oil & Gas Summit
IPsoft
 
From Series A to IPO: Scale your startup with Workato
From Series A to IPO: Scale your startup with WorkatoFrom Series A to IPO: Scale your startup with Workato
From Series A to IPO: Scale your startup with Workato
Jeraldine Phneah
 
Who is FW Warehousing?
Who is FW Warehousing?Who is FW Warehousing?
Who is FW Warehousing?
FW Warehousing
 
Hub16: Managing two distinctive workforce plans to drive growth
Hub16: Managing two distinctive workforce plans to drive growthHub16: Managing two distinctive workforce plans to drive growth
Hub16: Managing two distinctive workforce plans to drive growth
Anaplan
 
How Q2 eBanking Maximizes Customer Experience for a Hyper-Growth SaaS Platfor...
How Q2 eBanking Maximizes Customer Experience for a Hyper-Growth SaaS Platfor...How Q2 eBanking Maximizes Customer Experience for a Hyper-Growth SaaS Platfor...
How Q2 eBanking Maximizes Customer Experience for a Hyper-Growth SaaS Platfor...
AppDynamics
 
Créer la valeur dans l'économie digitale - Industrie du futur
Créer la valeur dans l'économie digitale - Industrie du futurCréer la valeur dans l'économie digitale - Industrie du futur
Créer la valeur dans l'économie digitale - Industrie du futur
Philippe Geoffroy
 
Sigma Conso Consolidation & Reporting
Sigma Conso Consolidation & ReportingSigma Conso Consolidation & Reporting
Sigma Conso Consolidation & Reporting
Sam Cheo
 
Rise with SAP
Rise with SAPRise with SAP
Rise with SAP
AGSanePLDTCompany
 

Similar to How to Reach 99.99% Uptime (20)

Capgemini Significantly Improves Performance and Reporting with Oracle Exadata
Capgemini Significantly Improves Performance and Reporting with Oracle ExadataCapgemini Significantly Improves Performance and Reporting with Oracle Exadata
Capgemini Significantly Improves Performance and Reporting with Oracle Exadata
 
Powering Business Transformation with Oracle Exadata: a Capgemini Case Study
Powering Business Transformation with Oracle Exadata: a Capgemini Case StudyPowering Business Transformation with Oracle Exadata: a Capgemini Case Study
Powering Business Transformation with Oracle Exadata: a Capgemini Case Study
 
SAP’s Intelligent Enterprise Strategy
SAP’s Intelligent Enterprise StrategySAP’s Intelligent Enterprise Strategy
SAP’s Intelligent Enterprise Strategy
 
AppSphere 15 - Achieving Enterprise Agility, Superior Scalability and Deploym...
AppSphere 15 - Achieving Enterprise Agility, Superior Scalability and Deploym...AppSphere 15 - Achieving Enterprise Agility, Superior Scalability and Deploym...
AppSphere 15 - Achieving Enterprise Agility, Superior Scalability and Deploym...
 
Introduction to 4castplus - Owners
Introduction to 4castplus - OwnersIntroduction to 4castplus - Owners
Introduction to 4castplus - Owners
 
Cfp guidance template_kpc_hr_cockpit
Cfp guidance template_kpc_hr_cockpitCfp guidance template_kpc_hr_cockpit
Cfp guidance template_kpc_hr_cockpit
 
E-business R12 Flow Process for P2P.pptx
E-business R12 Flow Process for P2P.pptxE-business R12 Flow Process for P2P.pptx
E-business R12 Flow Process for P2P.pptx
 
AU 2014: Autodesk PLM 360 Success Story with Inphi (PPT)
AU 2014: Autodesk PLM 360 Success Story with Inphi (PPT)AU 2014: Autodesk PLM 360 Success Story with Inphi (PPT)
AU 2014: Autodesk PLM 360 Success Story with Inphi (PPT)
 
EIE Sales Force Automation
EIE Sales Force AutomationEIE Sales Force Automation
EIE Sales Force Automation
 
Gaining efficiency at Barco with SAP Vendor Invoice Management
Gaining efficiency at Barco with SAP Vendor Invoice Management Gaining efficiency at Barco with SAP Vendor Invoice Management
Gaining efficiency at Barco with SAP Vendor Invoice Management
 
CabbageSoft.pptx
CabbageSoft.pptxCabbageSoft.pptx
CabbageSoft.pptx
 
The Business Justification for APM
The Business Justification for APMThe Business Justification for APM
The Business Justification for APM
 
IPsoft Briefing Workshop: Oil & Gas Summit
IPsoft Briefing Workshop: Oil & Gas SummitIPsoft Briefing Workshop: Oil & Gas Summit
IPsoft Briefing Workshop: Oil & Gas Summit
 
From Series A to IPO: Scale your startup with Workato
From Series A to IPO: Scale your startup with WorkatoFrom Series A to IPO: Scale your startup with Workato
From Series A to IPO: Scale your startup with Workato
 
Who is FW Warehousing?
Who is FW Warehousing?Who is FW Warehousing?
Who is FW Warehousing?
 
Hub16: Managing two distinctive workforce plans to drive growth
Hub16: Managing two distinctive workforce plans to drive growthHub16: Managing two distinctive workforce plans to drive growth
Hub16: Managing two distinctive workforce plans to drive growth
 
How Q2 eBanking Maximizes Customer Experience for a Hyper-Growth SaaS Platfor...
How Q2 eBanking Maximizes Customer Experience for a Hyper-Growth SaaS Platfor...How Q2 eBanking Maximizes Customer Experience for a Hyper-Growth SaaS Platfor...
How Q2 eBanking Maximizes Customer Experience for a Hyper-Growth SaaS Platfor...
 
Créer la valeur dans l'économie digitale - Industrie du futur
Créer la valeur dans l'économie digitale - Industrie du futurCréer la valeur dans l'économie digitale - Industrie du futur
Créer la valeur dans l'économie digitale - Industrie du futur
 
Sigma Conso Consolidation & Reporting
Sigma Conso Consolidation & ReportingSigma Conso Consolidation & Reporting
Sigma Conso Consolidation & Reporting
 
Rise with SAP
Rise with SAPRise with SAP
Rise with SAP
 

Recently uploaded

5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf
AlvianRamadhani5
 
Zener Diode and its V-I Characteristics and Applications
Zener Diode and its V-I Characteristics and ApplicationsZener Diode and its V-I Characteristics and Applications
Zener Diode and its V-I Characteristics and Applications
Shiny Christobel
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
shadow0702a
 
Open Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surfaceOpen Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surface
Indrajeet sahu
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
ecqow
 
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENTNATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
Addu25809
 
Accident detection system project report.pdf
Accident detection system project report.pdfAccident detection system project report.pdf
Accident detection system project report.pdf
Kamal Acharya
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
Roger Rozario
 
Generative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdfGenerative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdf
mahaffeycheryld
 
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
Anant Corporation
 
Pressure Relief valve used in flow line to release the over pressure at our d...
Pressure Relief valve used in flow line to release the over pressure at our d...Pressure Relief valve used in flow line to release the over pressure at our d...
Pressure Relief valve used in flow line to release the over pressure at our d...
cannyengineerings
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
ydzowc
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
PriyankaKilaniya
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
VANDANAMOHANGOUDA
 
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
ijseajournal
 
SCALING OF MOS CIRCUITS m .pptx
SCALING OF MOS CIRCUITS m                 .pptxSCALING OF MOS CIRCUITS m                 .pptx
SCALING OF MOS CIRCUITS m .pptx
harshapolam10
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
UReason
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
nedcocy
 
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
upoux
 
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptxSENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
b0754201
 

Recently uploaded (20)

5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf
 
Zener Diode and its V-I Characteristics and Applications
Zener Diode and its V-I Characteristics and ApplicationsZener Diode and its V-I Characteristics and Applications
Zener Diode and its V-I Characteristics and Applications
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
 
Open Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surfaceOpen Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surface
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
 
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENTNATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
 
Accident detection system project report.pdf
Accident detection system project report.pdfAccident detection system project report.pdf
Accident detection system project report.pdf
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
 
Generative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdfGenerative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdf
 
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
 
Pressure Relief valve used in flow line to release the over pressure at our d...
Pressure Relief valve used in flow line to release the over pressure at our d...Pressure Relief valve used in flow line to release the over pressure at our d...
Pressure Relief valve used in flow line to release the over pressure at our d...
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
 
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
 
SCALING OF MOS CIRCUITS m .pptx
SCALING OF MOS CIRCUITS m                 .pptxSCALING OF MOS CIRCUITS m                 .pptx
SCALING OF MOS CIRCUITS m .pptx
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
 
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
 
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptxSENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
 

How to Reach 99.99% Uptime

  • 1. Aiming for four nines with Apica Synthetic Anders Iderström Engineering Manager Live Operations & Observability
  • 2. • Unique offering • Fast growth = • Speed is king • Money is not an issue About Klarna
  • 3. • What we provide • What we rely on Live Operations
  • 4. We provide fast and consistent professional incident management. We act as an internal communications bridge and provide support on monitoring and log analysis. We identify and share knowledge about risks in an ever changing environment and usage scenario. We aim to be the source of truth and provide IT operations expertise. Live Operations
  • 5.
  • 6. Our everyday tools: Apica WPM, op5 Monitor, built-in alerting mechanisms, Splunk, Watchdog/opstat (Pentaho/HighCharts), Monks, Graphite, Grafana, Jarmon, Dashing. Also used at Klarna: New Relic, Kermit, OpsGenie, Oracle-related, finance IT related = if we don’t have it, we’re probably in the process of getting or writing or buying it.  Live Operations
  • 7. • IT-operations Monitoring • Panoptic development team • Personal experience from monitoring implementation in about 150 organisations • My development, training and coach • A hundred people in ITops • A couple of hundred other engineers (developers) My other powers
  • 8. Mid 2012 through late 2013
  • 9. 2014 and 2015 - way better
  • 10. Monthly KRED (KPM) availability 2014 and 2015 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2014 2015 99.8% 99.9% 99.95% Availability (Uptime/month) Downtime Minutes/month 2014 average 11 min down/month 22 sec down/day 99.975% availability 2015 average 11 min down/month 22 sec down/day 99.975% availability 0 22 44 66 88 110 132 154 176 198 220 Yes, averages are identical for 2014 and 2015 99.975% 100.00% Q4 2015 at average 99.997% for full quarter
  • 11. Q4 2015 KCO availability KCO RedFlow Sweden Oct Nov Dec 2015 99.8% 99.9% 99.95% Availability (Uptime/month) Downtime Minutes/month Q4 2015 average RedFlow SE 4.7 min down/month 9 sec down/day 99.99 % availability (full quarter) 100% availability in Nov and Dec 99.975% 100.00%0 22 44 66 88 110 132 154 176 198 220
  • 12.
  • 13. • How did we reach 99,975? • We integrated Apica WPM API with Opstat/Seqailizer/Petahoo (wallboarding last 2 minutes / last 2 hours, number of timeouts, avail-numbers, ect) • And ”Defined downtime”: ext/int: x number of timeouts of y number of minutes AND/OR other checks we can include, like Klarna Online or backoffice being unavailable 2014 and 2015 - continued
  • 15. • Measurements become more sensitve too, so you need to remember when your providers providers failed, or make a NOTE of it and exclude that from your measurements  • Added to a new feature • Requested at Apica Kundforum last year • Implemented, thanks Apica! Polishing the truth (feature req)
  • 16. from advanced integration and simple check, to advanced checks and no integration - It takes some time to get right - It requires tons of approvals in modern fintech systems + It’s quick and simple to consume + It’s fairly “new” (new type of checks, or at least newly completed for all big product versions/regions) + We need a rolling 12 month view now, as a complement to hour-to-hour wallboards Shifting focus
  • 17. The 8 step checks and how we see which one has failed: Zebratester (a.k.a Proxysniffer): Merchant Create Order (merchant_create_order_v2) Merchant Read Order (merchant_read_order_v2) Client Read Order (client_read_pre_purchase_order_v1) Client Update Order - Sending challenge information (client_update_pre_purchase_order_v1) Client Update Order - Sending billing information (client_update_pre_purchase_order_v1) Select Payment Method (client_update_pre_purchase_order_v1) PGW loading iframe - WebPage (client_update_pre_purchase_order_v1) PGW loading iframe - API details (client_update_pre_purchase_order_v1) Shifting focus
  • 18. Providing an additional 6-7 minutes uptime per month (99,975->99,990%) Communication platforms starts to become cost efficient Company goal: 99,97 IT-operations OKRs: Q1: 99,97, Q2: 99,98, Q3/Q4: 99,99 2016 company availability goal and 2016 ITops OKRs
  • 20. High pace = we quickly forget
  • 21.
  • 22. • Three decimals needed at reporting/pdf-e-mail-report level for proper precision in reporting numbers later combined with other figures to track top- level company goals • 10 decimals available in the API, but we’re not integrating at the moment  • Needed • Requested • Implemented (actually with four decimals) (thanks Apica!)  Actually seeing the last decimal - from suggestion to practical use (feature req)
  • 23. - Gradually remove dependencies - Let the experts handle stuff you heavily depend on and/or create specialized teams (networking, message bus, certificates) - Never repeat mistakes - Implement modern architecture - Serve your existing customers from your new shiny platform (LOCO) Reaching above 99.975%
  • 24. • Implement end-to-end responsibility • Get started on a shift in architecture • microservices in cloud platform • graceful degradation • but do it piece by piece (breaking pieces of the monolith) • Keep centralized Incident Management (operations land, OPs knowledge) • Do continuous improvement/feedback - on all levels (liveops, dev-teams, retros, incident reports and planned actions) • Save minutes/seconds in communications • Actually service all your customers from the new platform (solve the tech dept through massive added value) TAKEAWAYS

Editor's Notes

  1. Klarna is a Swedish e-commerce company that provides payment services for online storefronts. Our core service is to assume stores' claims for payments and handle customer payments, thus eliminating the risk for seller and buyer. Klarna allows users to pay with simply an email address, billing the customer later and paying the retailer in the meantime. Our business model thus differs from that of other online payment companies, which collect payment from the customer immediately and then forward the money to the retailer. About 40% of all e-commerce sales in Sweden goes through Klarna.[1] The company has more than 1400 employees, most of them working at the headquarters in Stockholm. In 2014, the company handled about $10 billion in online sales.[2] Klarna Group are active in 18 countries with an annual turnover of 315 million USD for 2015. Around 35 million consumers and 50 000 merchants use our services.
  2. LiveOps Intent 2016-04-07 (our strategy defined as our intent) Stats/facts - 50-ish onboarded services/teams - Around 500 checks can alert us ~30 major incidents per year (Major="Purchases are affected OR order handling is completely down for a service") ~160 total incidents per year ~30 on-call turn-out hours per month (outside office hours) 2014/2015: 80% of majors has MTTRepair < 30 minutes In 10min we have triage, status send-out, external status update, people collaborating on it in #liveops-incident in Slack, the people who wrote it on the phone, Crisis Management Team informed and possibly involved, BCP “auto-executed” (no verification needed)
  3. Tough times Around the clock on site supervision LiveOps contributed to nailing the root cause
  4. <numbers: mats’s presentation 99,975: month/avg/realworld and, contract states 99.8 with possibilities for scheduled downtime> - Konstant en balans att hålla mellan backoffice/merchant GUI usage/scripting/stupid stuff och värna uptiden - Hanterat elaka angrepp (DDoS and adaptive attacks)
  5. - Always open: no service-windows in real-world-stats/external uptime
  6. We established strong internal monitoring capabilities for specialized Erlang/OTP systems (Erlang is a programming language used to build massively scalable soft real-time systems with requirements on high availability) Show ” base package 2” wallboard screenshot Explain Crisis Management Team / Business Continuity Plan
  7. Team are forced to take on responsibility for full lifecycle management: deployment, maintenance, monitoring, metrics, the works. (reduce on-call turn-out / feel the pain) Take time to fully complete Really big undertaking, completely different tempo, continous delivery, weekly demos, milestones, etc – really cool! Everyone has on-call
  8. We simply do more in the GUI and by using the built-in reports.
  9. We stay one step ahead of the teams – end-to-end is not dead yet In case of a disturbance we can directly see which step that failed
  10. Company goal: 99,970 IT-operations OKRs: Q1: 99,97, Q2: 99,98 -> Q3/Q4: 99,99 We are affected by outages at AWS (worlds largest cloud provider), TCIS (second largest ISP in the world), GCC, PAY.ON, Basefarm, Bahnhof, etc
  11. Dec->Feb we outperformed Google Compute Services, AWS US West-1 (entire region went down), TSIC, PAY.ON, Basefarm, Bahnhof and Stockholm city electricity provider. We where however affected to some extent by all of them.
  12. When the number of services/products grow it gets harder to remember what incident impacted what service/services. When reporting averages you tend to become more forgiving = The need for availability breakdowns arises and timelines with incidents mapped out starts to become a very important decision support.
  13. Remove (examples) Transatlantic dependency ISP dependency (No customer facing services on premises) Single zone hosting provider dependencies Implement (examples): Graceful degradation Failover capabilities for provider SPOFs (card payment providers, lookup providers) Powerful DDoS protection External CDN