Upgrading complex stateful systems incrementally

Upgrading under the
weight of all that
state
Quinton Anderson

Source
Source
Source
Source
Raw
Data
Business
Data
Access
Layer
Access
Layer
Access
Layer
Access
Layer

Load
Balancer
//TODO
Function
Cntrl-V
Scaling

Downstream
systems
• Specialised
management
systems
• Reporting Systems
• Product
management
Channel &
product
systems
Master Data
Management
Hadoop
• Leverage all data & reduce
integration costs
• Comprehensive dataset –
internal & external, realtime &
batch, structured & unstructured
• Advanced analytics / machine
learning
Group Data
Warehouse
• Understand our business
• Accurate, conformed, and
reconciled data
• Access layer to support BI &
reporting
BI/Reporting
• User facing tools
• Regulatory reporting
• Dishoarding
• Self service BI for the
masses
Customer record &
insights
All data
Price,
conversation,
credit dec.
etc.
Financial Data
Subset of
data
User
access
Information for
people
Core Financial
Systems and
functions
• P&L
• Recon
• General Ledger
• Etc…
Closed loop,
automated ‘decisions’
Decisioning
• Personalise/optimise decisions,
maximise customer value
• E.g. price, credit decision, next
conversation, experience
Core information repositories
Analytics applications
Other systems

Channels
Hadoop
Rules
Serving and decisioning
Analytic
Records
Systems Of Record
Core
Banking
Payments
Event Processor
Raw Data
Derived Data
Feature Store
Event Store
Scoring
Machine
Learning
www
Event Streams
Customer
Information
data loaded
Data
analysed &
processed
Insights &
events
captured
Integration API/Service Discovery

Hbase,
Cassandra,
HDFS,
Influx,
Elastic Search,
Kafka,
Etcd,
Zookeeper
OpenStack Swift

Dev,
Test,
Staging,
Prod 1,
Prod 2,
Etc…

Practically, it is a closed system

State management is my problem

Repo(s) CI/CD Apps
Docker Calico
Mesos Yarn
Spark, MR, Impala, etc
Marathon,
Chronos, Cassandra, etc
CI/CD
CI/CD
Repo(s)
Repo(s)
Open
Stack
Nova
Nova/Ironic
OS
KVM
OS
Firmware + Hardware + Tags

Outsource the problem, and tool away the
resulting issues

Delete it, tool away the resulting issues

Be stateless, tool away the resulting issues

Implement some patterns, incrementally
optimise. Tool away the resulting issues

Consumer
Router
DB
Old Old
Web
App
DB
Web
App

L4
HAProxy
Old Old Old Old New

L4
HAProxy
Old Old Old New New

L4
HAProxy
Old Old New New New

L4
HAProxy
Old New New New New

L4
HAProxy
New New New New New

Implementation approaches = ??

for node in nodes:
if info[node]['instance']:
if Status(node).run().wait() == AVAILBLE_FOR_MAINTENANCE:
MaintenanceMode(node).run().wait()
Upgrade(node).run().wait()
Health = HealthTests(node).run.wait()
UpdateStatus(node, health).run.wait()

all_good = True
host = self.cdh.get_host(self.host_map[self.node_name])
if host.healthSummary != 'GOOD':
all_good = False
# Look up the host by its roles
for c in self.cdh.get_all_clusters():
for s in c.get_all_services():
for r in s.get_all_roles():
h = r.hostRef
if h.hostId == self.host_map[self.node_name]:
if r.healthSummary != 'GOOD':
all_good = False
return all_good

nodeComputation = for {
_ <- Status(node)
_ <- MaintenanceMode(_,node)
_ <- Upgrade(node)
nodeResult <- HealthTests(node)
} yield nodeResult
upgrade = for {
node <- group
comp <- nodeComputation(node)
} yield comp.exec
groups.map(upgrade)

Jenkins
Environment
Branch PR
Merge
Dev
Deploy
Master
Deploy
Test
Change
Plan

clusters:
green-cluster:
dns:
nameservers:
- x.x.x.x
data_domain: *.*.*
etcd:
token: green-cluster
masters:
able:
provision_id: 1
lan:
-
mac: 0c:c4:7a:c1:2e:92
ip: 1.1.11.151/24
vlan: 11
gateway: 1.1.1.1
ironic_id: a7af76ad-6583-4209-ba5f-cf1477b6405e
flavor: ramish-baremetal-flavor2
image: *mesos-master-green
theta:
provision_id: 2
lan:
-
mac: 0c:c4:7a:a9:04:0c
ip: 1.1.11.53/24
vlan: 11
gateway: 1.1.1.1
ironic_id: 8ff1fd1c-4893-11e6-a447-2f366077ca0e
tobias:
provision_id: 3
lan:
-
mac: 0c:c4:7a:a8:f6:ac
ip: 1.11.11.52/24
vlan: 11
gateway: 1.1.1.1
ironic_id: c89fdd08-232c-40fe-b965-49fc3e4dcba7

Instrument as much of deployment and
provisioning as you can

Optimise incrementally, learn the
right hard lessons

Allow for manual intervention, but
attack it aggressively

Encourage your people to intervene

Upgrading complex stateful systems incrementally

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Upgrading complex stateful systems incrementally

Similar to Upgrading complex stateful systems incrementally (20)

Recently uploaded

Recently uploaded (20)

Upgrading complex stateful systems incrementally