SlideShare a Scribd company logo
1 of 54
Availability in a Cloud-native World.
Guidelines for mere mortals.
Academy of Technology - PREVAIL 2019 – München 🇩🇪
—
Haytham Elkhoja
Chief Architect & Global Tech Leader
IBM Services - Continuous Availability (a.k.a Always On)
haytham.elkhoja@ibm.com
Relevant links and assets:
https://ibm.biz/alwaysonbook
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
/WHOIS
2
@hek
/in/haytham.Elkhoja
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
March 2017 “Amazon broke the
internet with a typo” cnn.com
Impacted apps:
- Netflix
- HootSuite
- Expedia
- Slack
- Business Insider
- Reddit
3
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
June 2019 “Google details
'catastrophic' cloud outage
events: Promises to do better
next time” zdnet.com
Impacted apps:
- Snapchat
- Spotify
- Google Docs
- Youtube
- Pokemon Go
- Gmail
4
What the
hell is
happening…
5Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 6
On why outages happen.
App and DB
67%
Batch
11%
Hardware
14%
Environmental
8%
Planned Outages
Process
40%
Application
40%
Hardware
10%
OS
10%
Unplanned Outages
IBM’s
Always On
Patterns
7Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 8
Keeping your app available during
planned and unplanned outages or
failures requires geographically-
distributed, multi-active, multi-
regions deployments.
Users
Data Replication
Session Replication
Traffic Traffic
Traffic
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 9
The IBM Always On Pattern starts
at the infrastructure layer,
progresses to the data,
influences application design and
extends to the people and the
culture.
Herbie Pearthree, Distinguished Engineer
hpear3@us.ibm.com
Everything
breaks!
10Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
State &
Consistency
Chaos &
Validation
Zones, Regions
& Swimlanes
Portability &
Deployment
Thinking differently
about Availability in a
Cloud-native world.
11Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Portability
&
Deployment
12Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Code differently.
Cloud-native Apps should be self-
contained, polyglot, loosely-
coupled, cattle-scaled, immutable,
idempotent, ephemeral and protocol
aware.
13
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
No two clouds are created equal.
Architect for cloud mobility. Your
app should be cloud, infrastructure
and OS agnostic. The 12 factor
patterns will help you get there.
14
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
No strings attached.
Environment variables should be
bootstrapped, also a requirement
for environment parity and your own
sanity.
15
FROM alpine:3.1
COPY app /app
COPY docker-entrypoint.sh /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]
docker build -t app:v2 .
docker run --rm 
-e "APP_DATADIR=/var/lib/data" 
-e "APP_HOST=host.com" 
-e "APP_PORT=3306" 
-e "APP_USERNAME=user" 
-e "APP_PASSWORD=password" 
-e "APP_DATABASE=test" 
app:v2
2019/10/15 04:44:29 Starting application...
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Delegate responsibilities.
Whatever as a Service. Somebody,
somewhere has done a much better
job.
16
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Trim down the fat.
Dependency management with multi-
stage builds is an art one must
pursue to keep apps clean and lean.
17
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Got Syslog?
Feed information and timestamp
using STDOUT and STDERR. Clarify
who’s the source.
18
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
git’s your bible.
Everything should be versioned,
ephemeral and reproducible using
GitOps methods. This includes
configuration files and
Infrastructure as Code.
19
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Design for failure.
Handle SIGTERM and SIGKILL like a
champ.
20
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
#$@&%*!
Fail gracefully and inform your
customers what’s up (or down), pun
intended.
21
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Robots > humans.
Actions performed by humans
hundreds of times won’t be
performed the same way each
time, even with the best
intentions. Automate.
22
Zones,
Regions
&
Swimlanes 23Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Resilient clouds don’t mean
resilient apps.
Multi active regions help you
scale while being resilient.
Out of Region is more than
just an insurance policy.
24
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Stay in your swimlane.
Respect region affinity and
stickiness using geo load
balancers to resolve traffic
to the nearest region and stay
there.
Crossing regions is a no no.
25
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
DNS is your best friend.
Religiously steer clear from
IP addresses. Service
discovery will point you to
the right path.
And if you can’t, use Anycast.
26
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
The most boring OS configs are
also the most important ones.
A /etc/resolv.conf ‘search’
entry forces traffic to your
swimlane’s subdomain, helping
you with region affinity.
27
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Share-nothing. Cluster-
nothing. Stretch-nothing.
Control-planes are delicate
creatures, especially if
stretched or shared.
28
DB DB
Disk
DB DB DB
Disk
DB
Disk
DB DB DB
DiskDisk Disk
Share
Everything
Share Disks
and Networking
Share Nothing
NetworkingNetworking Networking
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Bypass failures all together.
Disaster recovery processes
lead to a mediocre and
sometimes catastrophic
experience.
29
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Are we there yet?
Discover the awesome world of
service readiness, liveness
probes, circuit-breakers,
retries, rate-limiting,
bulkheading and fallbacks.
30
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
One deployment at a time.
Rolling updates strategies for zero
downtime deployments within a
cluster or availability zone.
31
Deploy by adding an instance, then
remove an old one
Deploy by removing an instance, then
add a new one
Deploy by updating instances as fast as
possible
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
One region at a time.
Then do the same across regions.
Your customers will not even
know what’s happening behind
the scenes.
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Love thy neighbor.
Configure resource requests and
limits. Throttle API requests.
State
&
Consistency
34Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
The network is reliable. Right.
CAP Theorem must be well understood
when choosing data stores. Knowing
that partition tolerance cannot be
sacrificed, pick consistency or
availability.
35
P
A C
Pick
A or C
Oracle, DB2, MySQL etc…
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Do you really need Strong
Consistency?
Applications can support weak,
eventual, or strong consistency.
36
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Distributed consistency is already
difficult as it is.
Normally, higher availability means
higher revenue. Think of ATM
machines. A trumps C.
Educate your business on eventual
consistency. Strong consistency
should be the last option, unless
you’re the NYSE.
37
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Master! Master!
Write anywhere and everywhere.
Master-Master, Master-less and
Peer to Peer database-level
replication.
Shard, partition or
Write/Query if you can’t.
38
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Data Replication. More than
meets the eye.
Data patterns differ. Not all
data is created equal.
39
Messaging
BPM
CEP
APP
Active standby
or active/query
Hot standby
or configured
active/active for
fast switchover
Multi-master
or peer-to-peer
write anywhere
Data distribution
filter and push
Data warehouse
integration and
federation
Data through
messaging filter
and push
distribution
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Conflict resolution during a
network partition will make
you creative.
Log and notify conflicts.
Last-write-wins, CQRS, write
partitioning are all valid but
subjective (and emotional)
decisions.
40
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
NTP is dead. Long live NTP.
Achieve globally distributed,
consensus respected,
synchronously-replicated,
databases with Google TrueTime
and AWS Time Sync, if you
really need it.
41
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Database is much more than
just a DBA’s job.
Database versioning and
backward-compatible schemas
are not optional, but
compulsory.
42
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Why is my shopping cart empty?
Aim for stateless, but
maintain sessions, if you
must.
43
Chaos
&
Validation
44Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Design for feedback.
Measure every single detail
via KPIs and SLIs. Capture
metrics and logs. There’s no
such thing as too much logs.
45
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Hope is not a strategy.
Reduce uncertainty with game days,
then aim to regularly injecting
failure in your production
environment.
46
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Continuous tinkering is healthy.
Use randomness to spoon-feed
yourself with discoveries. You’ll
be surprised what you come across.
47
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
You don’t choose Chaos Monkey.
Chaos Monkey chooses you.
When pursuing Chaos
Engineering, start controlled,
small, observe, squash and
learn.
Remember, there is nothing
Chaotic about Chaos
Engineering.
48
“Chaos Engineering the discipline
of experimenting on a distributed
system in order to build
confidence in the system's
capability to withstand turbulent
conditions in production.”
https://principlesofchaos.org
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Chaos Engineering is a
collection of “What if”s.
What if I add latency? What if
I DDoS a service? What if I
change the hardware clock?
49
Example of tests:
• tc qdisc add dev eth0 root netem delay 300ms
• wrk -t12 -c400 -d30s http://host/api/request
• stress-ng --random 50 -t 60 --metrics-brief --times
• iptables -I OUTPUT -p udp -d DNS Server --dport 53 -j DROP
• umount /mnt/blockstorage
• hwclock
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
The rollback button is a lie.
That’s not only true for
application deployments but also
for fault injection, as both face
the same fundamental problem:
State.
50
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Go beyond trivial ICMP and
connection tests.
Synthetic automated monitoring
help you understand what your
digital users experience far
from typical platform
monitoring.
Do it from multiple locations.
51
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Love DevOps? Wait till you
meet SRE.
SRE is what happens when you
ask a software engineer to
design an operations team.
52
Thank you!
53Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
54Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪

More Related Content

What's hot

History of Data-Centric Transformation
History of Data-Centric TransformationHistory of Data-Centric Transformation
History of Data-Centric Transformationscoopnewsgroup
 
Evolving to Cloud-Native - Nate Schutta (1/2)
Evolving to Cloud-Native - Nate Schutta (1/2)Evolving to Cloud-Native - Nate Schutta (1/2)
Evolving to Cloud-Native - Nate Schutta (1/2)VMware Tanzu
 
SOCstock 2021 The Cloud-native SOC
SOCstock 2021 The Cloud-native SOC SOCstock 2021 The Cloud-native SOC
SOCstock 2021 The Cloud-native SOC Anton Chuvakin
 
Cloud-Native Microservices
Cloud-Native MicroservicesCloud-Native Microservices
Cloud-Native MicroservicesDiego Pacheco
 
The good, the bad, and the ugly of migrating hundreds of legacy applications ...
The good, the bad, and the ugly of migrating hundreds of legacy applications ...The good, the bad, and the ugly of migrating hundreds of legacy applications ...
The good, the bad, and the ugly of migrating hundreds of legacy applications ...Josef Adersberger
 
The Paved Road at Netflix
The Paved Road at NetflixThe Paved Road at Netflix
The Paved Road at NetflixDianne Marsh
 
Software Architecture Conference - Monitoring Microservices - A Challenge
Software Architecture Conference -  Monitoring Microservices - A ChallengeSoftware Architecture Conference -  Monitoring Microservices - A Challenge
Software Architecture Conference - Monitoring Microservices - A ChallengeAdrian Cockcroft
 
Digital foundations - Paving the road to cloud solutions
Digital foundations - Paving the road to cloud solutionsDigital foundations - Paving the road to cloud solutions
Digital foundations - Paving the road to cloud solutionsEric D. Schabell
 
Why cloud native matters
Why cloud native mattersWhy cloud native matters
Why cloud native mattersCheryl Hung
 
Cloud Computing is not simple
Cloud Computing is not simpleCloud Computing is not simple
Cloud Computing is not simpleCloudOps Summit
 
StorageOS - 8 core principles of cloud native storage
StorageOS - 8 core principles of cloud native storageStorageOS - 8 core principles of cloud native storage
StorageOS - 8 core principles of cloud native storageStorageOS
 
ClouNS - A Cloud-native Application Reference Model for Enterprise Architects
ClouNS - A Cloud-native Application Reference Model for Enterprise ArchitectsClouNS - A Cloud-native Application Reference Model for Enterprise Architects
ClouNS - A Cloud-native Application Reference Model for Enterprise ArchitectsNane Kratzke
 
App Dev in the Cloud: Not my circus, not my monkeys...
App Dev in the Cloud: Not my circus, not my monkeys...App Dev in the Cloud: Not my circus, not my monkeys...
App Dev in the Cloud: Not my circus, not my monkeys...Eric D. Schabell
 
Microservices in the cloud at AutoScout24
Microservices in the cloud at AutoScout24Microservices in the cloud at AutoScout24
Microservices in the cloud at AutoScout24Christian Deger
 
Cloud Native Computing: What does it mean, and is your app Cloud Native?
Cloud Native Computing: What does it mean, and is your app Cloud Native?Cloud Native Computing: What does it mean, and is your app Cloud Native?
Cloud Native Computing: What does it mean, and is your app Cloud Native?Michael O'Sullivan
 
Red Hat Summit - Discover the foundations of digital transformation
Red Hat Summit - Discover the foundations of digital transformationRed Hat Summit - Discover the foundations of digital transformation
Red Hat Summit - Discover the foundations of digital transformationEric D. Schabell
 
Fast Delivery DevOps Israel
Fast Delivery DevOps IsraelFast Delivery DevOps Israel
Fast Delivery DevOps IsraelAdrian Cockcroft
 
Cloud Native in the Enterprise: Real-World Data on Container and Microservice...
Cloud Native in the Enterprise: Real-World Data on Container and Microservice...Cloud Native in the Enterprise: Real-World Data on Container and Microservice...
Cloud Native in the Enterprise: Real-World Data on Container and Microservice...Donnie Berkholz
 

What's hot (20)

Cloud Native Machine Learning
Cloud Native Machine Learning Cloud Native Machine Learning
Cloud Native Machine Learning
 
History of Data-Centric Transformation
History of Data-Centric TransformationHistory of Data-Centric Transformation
History of Data-Centric Transformation
 
Evolving to Cloud-Native - Nate Schutta (1/2)
Evolving to Cloud-Native - Nate Schutta (1/2)Evolving to Cloud-Native - Nate Schutta (1/2)
Evolving to Cloud-Native - Nate Schutta (1/2)
 
SOCstock 2021 The Cloud-native SOC
SOCstock 2021 The Cloud-native SOC SOCstock 2021 The Cloud-native SOC
SOCstock 2021 The Cloud-native SOC
 
Cloud-Native Microservices
Cloud-Native MicroservicesCloud-Native Microservices
Cloud-Native Microservices
 
The good, the bad, and the ugly of migrating hundreds of legacy applications ...
The good, the bad, and the ugly of migrating hundreds of legacy applications ...The good, the bad, and the ugly of migrating hundreds of legacy applications ...
The good, the bad, and the ugly of migrating hundreds of legacy applications ...
 
The Paved Road at Netflix
The Paved Road at NetflixThe Paved Road at Netflix
The Paved Road at Netflix
 
Software Architecture Conference - Monitoring Microservices - A Challenge
Software Architecture Conference -  Monitoring Microservices - A ChallengeSoftware Architecture Conference -  Monitoring Microservices - A Challenge
Software Architecture Conference - Monitoring Microservices - A Challenge
 
Digital foundations - Paving the road to cloud solutions
Digital foundations - Paving the road to cloud solutionsDigital foundations - Paving the road to cloud solutions
Digital foundations - Paving the road to cloud solutions
 
Cloud Native: what is it? Why?
Cloud Native: what is it? Why?Cloud Native: what is it? Why?
Cloud Native: what is it? Why?
 
Why cloud native matters
Why cloud native mattersWhy cloud native matters
Why cloud native matters
 
Cloud Computing is not simple
Cloud Computing is not simpleCloud Computing is not simple
Cloud Computing is not simple
 
StorageOS - 8 core principles of cloud native storage
StorageOS - 8 core principles of cloud native storageStorageOS - 8 core principles of cloud native storage
StorageOS - 8 core principles of cloud native storage
 
ClouNS - A Cloud-native Application Reference Model for Enterprise Architects
ClouNS - A Cloud-native Application Reference Model for Enterprise ArchitectsClouNS - A Cloud-native Application Reference Model for Enterprise Architects
ClouNS - A Cloud-native Application Reference Model for Enterprise Architects
 
App Dev in the Cloud: Not my circus, not my monkeys...
App Dev in the Cloud: Not my circus, not my monkeys...App Dev in the Cloud: Not my circus, not my monkeys...
App Dev in the Cloud: Not my circus, not my monkeys...
 
Microservices in the cloud at AutoScout24
Microservices in the cloud at AutoScout24Microservices in the cloud at AutoScout24
Microservices in the cloud at AutoScout24
 
Cloud Native Computing: What does it mean, and is your app Cloud Native?
Cloud Native Computing: What does it mean, and is your app Cloud Native?Cloud Native Computing: What does it mean, and is your app Cloud Native?
Cloud Native Computing: What does it mean, and is your app Cloud Native?
 
Red Hat Summit - Discover the foundations of digital transformation
Red Hat Summit - Discover the foundations of digital transformationRed Hat Summit - Discover the foundations of digital transformation
Red Hat Summit - Discover the foundations of digital transformation
 
Fast Delivery DevOps Israel
Fast Delivery DevOps IsraelFast Delivery DevOps Israel
Fast Delivery DevOps Israel
 
Cloud Native in the Enterprise: Real-World Data on Container and Microservice...
Cloud Native in the Enterprise: Real-World Data on Container and Microservice...Cloud Native in the Enterprise: Real-World Data on Container and Microservice...
Cloud Native in the Enterprise: Real-World Data on Container and Microservice...
 

Similar to Availability in a cloud native world - Guidelines for mere mortals v2.0

Red Hat Forum Poland 2019 - 3 Pitfalls Everyone Should Avoid with Hybrid Mult...
Red Hat Forum Poland 2019 - 3 Pitfalls Everyone Should Avoid with Hybrid Mult...Red Hat Forum Poland 2019 - 3 Pitfalls Everyone Should Avoid with Hybrid Mult...
Red Hat Forum Poland 2019 - 3 Pitfalls Everyone Should Avoid with Hybrid Mult...Eric D. Schabell
 
Module 3 IUT Bobigny : Infrastructure et Opérations
Module 3 IUT Bobigny : Infrastructure et OpérationsModule 3 IUT Bobigny : Infrastructure et Opérations
Module 3 IUT Bobigny : Infrastructure et OpérationsFrédéric Rivain
 
Will the Cloud be your disaster, or will Cloud be your disaster recovery?
Will the Cloud be your disaster, or will Cloud be your disaster recovery?Will the Cloud be your disaster, or will Cloud be your disaster recovery?
Will the Cloud be your disaster, or will Cloud be your disaster recovery?Livingstone Advisory
 
Your Journey to the Cloud
Your Journey to the CloudYour Journey to the Cloud
Your Journey to the CloudDori Degenhardt
 
Open stack summit spring 2014 hybrid cloud landmines - 2014-05-15
Open stack summit spring 2014   hybrid cloud landmines - 2014-05-15Open stack summit spring 2014   hybrid cloud landmines - 2014-05-15
Open stack summit spring 2014 hybrid cloud landmines - 2014-05-15drumulonimbus
 
Enterprise Cloud Native is the New Normal
Enterprise Cloud Native is the New NormalEnterprise Cloud Native is the New Normal
Enterprise Cloud Native is the New NormalQAware GmbH
 
Stretching the Open Source Network - Livnat Peer, Red Hat - Cloud Native Day ...
Stretching the Open Source Network - Livnat Peer, Red Hat - Cloud Native Day ...Stretching the Open Source Network - Livnat Peer, Red Hat - Cloud Native Day ...
Stretching the Open Source Network - Livnat Peer, Red Hat - Cloud Native Day ...Cloud Native Day Tel Aviv
 
Cloud Seminar Feb 4 2010
Cloud Seminar Feb 4 2010Cloud Seminar Feb 4 2010
Cloud Seminar Feb 4 2010Vince Santo
 
Ever–ready for every opportunity
Ever–ready for every opportunityEver–ready for every opportunity
Ever–ready for every opportunityaccenture
 
CN_Simplifiedv1.pptx
CN_Simplifiedv1.pptxCN_Simplifiedv1.pptx
CN_Simplifiedv1.pptxKai Viljanen
 
CLOUD, FOG, OR SMOG?
CLOUD, FOG, OR SMOG?CLOUD, FOG, OR SMOG?
CLOUD, FOG, OR SMOG?karlmotz
 
Kubernetes - 7 lessons learned from 7 data centers in 7 months
Kubernetes - 7 lessons learned from 7 data centers in 7 monthsKubernetes - 7 lessons learned from 7 data centers in 7 months
Kubernetes - 7 lessons learned from 7 data centers in 7 monthsMichael Tougeron
 
Red Hat Summit 2018 - 3 pitfalls everyone should avoid with hybrid multicloud
Red Hat Summit 2018 - 3 pitfalls everyone should avoid with hybrid multicloudRed Hat Summit 2018 - 3 pitfalls everyone should avoid with hybrid multicloud
Red Hat Summit 2018 - 3 pitfalls everyone should avoid with hybrid multicloudEric D. Schabell
 
Pathways to Multicloud Transformation
Pathways to Multicloud TransformationPathways to Multicloud Transformation
Pathways to Multicloud TransformationIBM
 
WeLab Reaps Advantages of Multi-Cloud Capabilities. You Can Too.
WeLab Reaps Advantages of Multi-Cloud Capabilities. You Can Too.WeLab Reaps Advantages of Multi-Cloud Capabilities. You Can Too.
WeLab Reaps Advantages of Multi-Cloud Capabilities. You Can Too.NuoDB
 

Similar to Availability in a cloud native world - Guidelines for mere mortals v2.0 (20)

Cloudcomputing
CloudcomputingCloudcomputing
Cloudcomputing
 
Red Hat Forum Poland 2019 - 3 Pitfalls Everyone Should Avoid with Hybrid Mult...
Red Hat Forum Poland 2019 - 3 Pitfalls Everyone Should Avoid with Hybrid Mult...Red Hat Forum Poland 2019 - 3 Pitfalls Everyone Should Avoid with Hybrid Mult...
Red Hat Forum Poland 2019 - 3 Pitfalls Everyone Should Avoid with Hybrid Mult...
 
Module 3 IUT Bobigny : Infrastructure et Opérations
Module 3 IUT Bobigny : Infrastructure et OpérationsModule 3 IUT Bobigny : Infrastructure et Opérations
Module 3 IUT Bobigny : Infrastructure et Opérations
 
Hybrid cloud computing explained
Hybrid cloud computing explainedHybrid cloud computing explained
Hybrid cloud computing explained
 
Will the Cloud be your disaster, or will Cloud be your disaster recovery?
Will the Cloud be your disaster, or will Cloud be your disaster recovery?Will the Cloud be your disaster, or will Cloud be your disaster recovery?
Will the Cloud be your disaster, or will Cloud be your disaster recovery?
 
CloudCamp
CloudCampCloudCamp
CloudCamp
 
Your Journey to the Cloud
Your Journey to the CloudYour Journey to the Cloud
Your Journey to the Cloud
 
Open stack summit spring 2014 hybrid cloud landmines - 2014-05-15
Open stack summit spring 2014   hybrid cloud landmines - 2014-05-15Open stack summit spring 2014   hybrid cloud landmines - 2014-05-15
Open stack summit spring 2014 hybrid cloud landmines - 2014-05-15
 
Enterprise Cloud Native is the New Normal
Enterprise Cloud Native is the New NormalEnterprise Cloud Native is the New Normal
Enterprise Cloud Native is the New Normal
 
Stretching the Open Source Network - Livnat Peer, Red Hat - Cloud Native Day ...
Stretching the Open Source Network - Livnat Peer, Red Hat - Cloud Native Day ...Stretching the Open Source Network - Livnat Peer, Red Hat - Cloud Native Day ...
Stretching the Open Source Network - Livnat Peer, Red Hat - Cloud Native Day ...
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Cloud Seminar Feb 4 2010
Cloud Seminar Feb 4 2010Cloud Seminar Feb 4 2010
Cloud Seminar Feb 4 2010
 
IBM Cloud
IBM Cloud IBM Cloud
IBM Cloud
 
Ever–ready for every opportunity
Ever–ready for every opportunityEver–ready for every opportunity
Ever–ready for every opportunity
 
CN_Simplifiedv1.pptx
CN_Simplifiedv1.pptxCN_Simplifiedv1.pptx
CN_Simplifiedv1.pptx
 
CLOUD, FOG, OR SMOG?
CLOUD, FOG, OR SMOG?CLOUD, FOG, OR SMOG?
CLOUD, FOG, OR SMOG?
 
Kubernetes - 7 lessons learned from 7 data centers in 7 months
Kubernetes - 7 lessons learned from 7 data centers in 7 monthsKubernetes - 7 lessons learned from 7 data centers in 7 months
Kubernetes - 7 lessons learned from 7 data centers in 7 months
 
Red Hat Summit 2018 - 3 pitfalls everyone should avoid with hybrid multicloud
Red Hat Summit 2018 - 3 pitfalls everyone should avoid with hybrid multicloudRed Hat Summit 2018 - 3 pitfalls everyone should avoid with hybrid multicloud
Red Hat Summit 2018 - 3 pitfalls everyone should avoid with hybrid multicloud
 
Pathways to Multicloud Transformation
Pathways to Multicloud TransformationPathways to Multicloud Transformation
Pathways to Multicloud Transformation
 
WeLab Reaps Advantages of Multi-Cloud Capabilities. You Can Too.
WeLab Reaps Advantages of Multi-Cloud Capabilities. You Can Too.WeLab Reaps Advantages of Multi-Cloud Capabilities. You Can Too.
WeLab Reaps Advantages of Multi-Cloud Capabilities. You Can Too.
 

Recently uploaded

Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Recently uploaded (20)

Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

Availability in a cloud native world - Guidelines for mere mortals v2.0

  • 1. Availability in a Cloud-native World. Guidelines for mere mortals. Academy of Technology - PREVAIL 2019 – München 🇩🇪 — Haytham Elkhoja Chief Architect & Global Tech Leader IBM Services - Continuous Availability (a.k.a Always On) haytham.elkhoja@ibm.com Relevant links and assets: https://ibm.biz/alwaysonbook
  • 2. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 /WHOIS 2 @hek /in/haytham.Elkhoja
  • 3. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 March 2017 “Amazon broke the internet with a typo” cnn.com Impacted apps: - Netflix - HootSuite - Expedia - Slack - Business Insider - Reddit 3
  • 4. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 June 2019 “Google details 'catastrophic' cloud outage events: Promises to do better next time” zdnet.com Impacted apps: - Snapchat - Spotify - Google Docs - Youtube - Pokemon Go - Gmail 4
  • 5. What the hell is happening… 5Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
  • 6. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 6 On why outages happen. App and DB 67% Batch 11% Hardware 14% Environmental 8% Planned Outages Process 40% Application 40% Hardware 10% OS 10% Unplanned Outages
  • 7. IBM’s Always On Patterns 7Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
  • 8. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 8 Keeping your app available during planned and unplanned outages or failures requires geographically- distributed, multi-active, multi- regions deployments. Users Data Replication Session Replication Traffic Traffic Traffic
  • 9. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 9 The IBM Always On Pattern starts at the infrastructure layer, progresses to the data, influences application design and extends to the people and the culture. Herbie Pearthree, Distinguished Engineer hpear3@us.ibm.com
  • 10. Everything breaks! 10Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
  • 11. State & Consistency Chaos & Validation Zones, Regions & Swimlanes Portability & Deployment Thinking differently about Availability in a Cloud-native world. 11Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
  • 12. Portability & Deployment 12Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
  • 13. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Code differently. Cloud-native Apps should be self- contained, polyglot, loosely- coupled, cattle-scaled, immutable, idempotent, ephemeral and protocol aware. 13
  • 14. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 No two clouds are created equal. Architect for cloud mobility. Your app should be cloud, infrastructure and OS agnostic. The 12 factor patterns will help you get there. 14
  • 15. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 No strings attached. Environment variables should be bootstrapped, also a requirement for environment parity and your own sanity. 15 FROM alpine:3.1 COPY app /app COPY docker-entrypoint.sh /entrypoint.sh ENTRYPOINT ["/entrypoint.sh"] docker build -t app:v2 . docker run --rm -e "APP_DATADIR=/var/lib/data" -e "APP_HOST=host.com" -e "APP_PORT=3306" -e "APP_USERNAME=user" -e "APP_PASSWORD=password" -e "APP_DATABASE=test" app:v2 2019/10/15 04:44:29 Starting application...
  • 16. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Delegate responsibilities. Whatever as a Service. Somebody, somewhere has done a much better job. 16
  • 17. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Trim down the fat. Dependency management with multi- stage builds is an art one must pursue to keep apps clean and lean. 17
  • 18. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Got Syslog? Feed information and timestamp using STDOUT and STDERR. Clarify who’s the source. 18
  • 19. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 git’s your bible. Everything should be versioned, ephemeral and reproducible using GitOps methods. This includes configuration files and Infrastructure as Code. 19
  • 20. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Design for failure. Handle SIGTERM and SIGKILL like a champ. 20
  • 21. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 #$@&%*! Fail gracefully and inform your customers what’s up (or down), pun intended. 21
  • 22. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Robots > humans. Actions performed by humans hundreds of times won’t be performed the same way each time, even with the best intentions. Automate. 22
  • 23. Zones, Regions & Swimlanes 23Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
  • 24. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Resilient clouds don’t mean resilient apps. Multi active regions help you scale while being resilient. Out of Region is more than just an insurance policy. 24
  • 25. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Stay in your swimlane. Respect region affinity and stickiness using geo load balancers to resolve traffic to the nearest region and stay there. Crossing regions is a no no. 25
  • 26. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 DNS is your best friend. Religiously steer clear from IP addresses. Service discovery will point you to the right path. And if you can’t, use Anycast. 26
  • 27. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 The most boring OS configs are also the most important ones. A /etc/resolv.conf ‘search’ entry forces traffic to your swimlane’s subdomain, helping you with region affinity. 27
  • 28. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Share-nothing. Cluster- nothing. Stretch-nothing. Control-planes are delicate creatures, especially if stretched or shared. 28 DB DB Disk DB DB DB Disk DB Disk DB DB DB DiskDisk Disk Share Everything Share Disks and Networking Share Nothing NetworkingNetworking Networking
  • 29. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Bypass failures all together. Disaster recovery processes lead to a mediocre and sometimes catastrophic experience. 29
  • 30. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Are we there yet? Discover the awesome world of service readiness, liveness probes, circuit-breakers, retries, rate-limiting, bulkheading and fallbacks. 30
  • 31. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 One deployment at a time. Rolling updates strategies for zero downtime deployments within a cluster or availability zone. 31 Deploy by adding an instance, then remove an old one Deploy by removing an instance, then add a new one Deploy by updating instances as fast as possible
  • 32. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 One region at a time. Then do the same across regions. Your customers will not even know what’s happening behind the scenes.
  • 33. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Love thy neighbor. Configure resource requests and limits. Throttle API requests.
  • 34. State & Consistency 34Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
  • 35. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 The network is reliable. Right. CAP Theorem must be well understood when choosing data stores. Knowing that partition tolerance cannot be sacrificed, pick consistency or availability. 35 P A C Pick A or C Oracle, DB2, MySQL etc…
  • 36. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Do you really need Strong Consistency? Applications can support weak, eventual, or strong consistency. 36
  • 37. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Distributed consistency is already difficult as it is. Normally, higher availability means higher revenue. Think of ATM machines. A trumps C. Educate your business on eventual consistency. Strong consistency should be the last option, unless you’re the NYSE. 37
  • 38. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Master! Master! Write anywhere and everywhere. Master-Master, Master-less and Peer to Peer database-level replication. Shard, partition or Write/Query if you can’t. 38
  • 39. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Data Replication. More than meets the eye. Data patterns differ. Not all data is created equal. 39 Messaging BPM CEP APP Active standby or active/query Hot standby or configured active/active for fast switchover Multi-master or peer-to-peer write anywhere Data distribution filter and push Data warehouse integration and federation Data through messaging filter and push distribution
  • 40. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Conflict resolution during a network partition will make you creative. Log and notify conflicts. Last-write-wins, CQRS, write partitioning are all valid but subjective (and emotional) decisions. 40
  • 41. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 NTP is dead. Long live NTP. Achieve globally distributed, consensus respected, synchronously-replicated, databases with Google TrueTime and AWS Time Sync, if you really need it. 41
  • 42. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Database is much more than just a DBA’s job. Database versioning and backward-compatible schemas are not optional, but compulsory. 42
  • 43. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Why is my shopping cart empty? Aim for stateless, but maintain sessions, if you must. 43
  • 44. Chaos & Validation 44Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
  • 45. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Design for feedback. Measure every single detail via KPIs and SLIs. Capture metrics and logs. There’s no such thing as too much logs. 45
  • 46. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Hope is not a strategy. Reduce uncertainty with game days, then aim to regularly injecting failure in your production environment. 46
  • 47. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Continuous tinkering is healthy. Use randomness to spoon-feed yourself with discoveries. You’ll be surprised what you come across. 47
  • 48. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 You don’t choose Chaos Monkey. Chaos Monkey chooses you. When pursuing Chaos Engineering, start controlled, small, observe, squash and learn. Remember, there is nothing Chaotic about Chaos Engineering. 48 “Chaos Engineering the discipline of experimenting on a distributed system in order to build confidence in the system's capability to withstand turbulent conditions in production.” https://principlesofchaos.org
  • 49. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Chaos Engineering is a collection of “What if”s. What if I add latency? What if I DDoS a service? What if I change the hardware clock? 49 Example of tests: • tc qdisc add dev eth0 root netem delay 300ms • wrk -t12 -c400 -d30s http://host/api/request • stress-ng --random 50 -t 60 --metrics-brief --times • iptables -I OUTPUT -p udp -d DNS Server --dport 53 -j DROP • umount /mnt/blockstorage • hwclock
  • 50. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 The rollback button is a lie. That’s not only true for application deployments but also for fault injection, as both face the same fundamental problem: State. 50
  • 51. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Go beyond trivial ICMP and connection tests. Synthetic automated monitoring help you understand what your digital users experience far from typical platform monitoring. Do it from multiple locations. 51
  • 52. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Love DevOps? Wait till you meet SRE. SRE is what happens when you ask a software engineer to design an operations team. 52
  • 53. Thank you! 53Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
  • 54. 54Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪