• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Rollback: The Impossible Dream
 

Rollback: The Impossible Dream

on

  • 1,543 views

Roll back doesn’t exist. It’s not real. It’s a fantasy, a dream, a delusion. Any vendor who tells you they have a roll back capability is lying to you. And lying to you in a downright dangerous ...

Roll back doesn’t exist. It’s not real. It’s a fantasy, a dream, a delusion. Any vendor who tells you they have a roll back capability is lying to you. And lying to you in a downright dangerous way that will come back to haunt you at 4am in a war room when someone says:

“We can’t fix this. Let’s roll back the deployment.”

This talk is designed to explain and demonstrate to Operations staff:

Why roll back is a fantasy and explained with a dash of Werner Heisenberg
Why it is dangerous and how you can recognize when you’re about to get trapped
How you can avoid falling into that trap of considering it an appropriate compensating control.
It’ll also explain what you can actually do operationally instead of “rolling back”. This will cover other alternative compensating controls that can help you get running again and resolve your outage whilst still allowing you to find root cause.

Statistics

Views

Total Views
1,543
Views on SlideShare
1,516
Embed Views
27

Actions

Likes
0
Downloads
10
Comments
0

3 Embeds 27

http://www.linkedin.com 14
https://www.linkedin.com 7
http://coderwall.com 6

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Dev / Ops / QA / Security? / Others?
  • ??
  • Does anyone rely on rollback?
  • This is very much opinion based on experience. Everyone’s shop is different – everyone has different constraints and requirements.A trading house differs from a twitter analytics company differs from a hospital from a .gov/Fed.Distance the discussion from ”the sometimes emotional standpoints that bind system administrators to the notion of rollback: desperately wanting does not make it possible”But every shop has technical heritage and technical debtEstablished institutional memory/remembered painApproach with an open mind and don’t make assumptionsWelcome new ideas and evaluate old constructsYou don’t have to agree / you can think I am a clueless idiot – as long as you do so based on clear, established data not “we’re a different special snowflake” because you’re fucking not.
  • Changed my views a little since writing the abstract.
  • Trad/Modern – arbitrary labelsDatabase rollback, transactional rollbackIn single-threaded and parallel software applications, many authors have developed a ‘journaling’ approach to reversibility and rollback (see foregoing references on checkpointing). A stackof state-history can be kept to arbitrary accuracy (and at proportional cost), provided there is sufficient memory to document changes.
  • Service rollback. Many interconnecting components.Interconnectness between application(s) and infrastructure changesRelease management, Checkpointing, Snapshots and version control In more general ‘open’ (or incompletely specified) systems the cost of maintaining history increases without bound as system complexity increases.
  • Rollback isn’t a myth – for certain definitions in certain circumstances it MAY be possible to do something that resembles a rollback.
  • Roll-back recovery requires that the operations between the checkpoint and the detected erroneous state can be made idempotent.
  • Apply enough money and set enough constraints and you can have something like rollback.Duplicate infrastructure / scale
  • Roll-back recovery requires that the operations between the checkpoint and the detected erroneous state can be made idempotent.
  • A cascading rollback occurs in database systems when a transaction (T1) causes a failure and a rollback must be performed. Other transactions dependent on T1's actions must also be rollbacked due to T1's failure, thus causing a cascading effect. That is, one transaction's failure causes many to fail.Practical database recovery techniques guarantee cascadeless rollback, therefore a cascading rollback is not a desirable result.
  • You must have sufficient memory/storage/resources to maintain sufficient history to rollback to a specified point
  • Story about University and “You are here” signs. Promised Heisenberg – uncertainty principle lower bound on the precision on which certain pairs of properties of particles can be measured (location / speed). The closer you measure one the harder it is to measure the other. Observer principle – observing things actually resulting in making it hard to measure them.
  • Story about University and “You are here” signs. Promised Heisenberg – uncertainty principle lower bound on the precision on which certain pairs of properties of particles can be measured (location / speed). The closer you measure one the harder it is to measure the other. Observer principle – observing things actually resulting in making it hard to measure them
  • A deterministic system is one in which no randomness in the development of future states of the system. Lessons learnt about Complex systems and systems thinking.
  • I have a Liberal Arts degree and got someone sciency and smart to explain the hard bits to me.
  • Risk – false sense of security
  • Unless you are committed to testing 'rollback' on a regular basis,maybe even every deploy, you inevitably end up in a situation where atthe worst possible moment you are going to be depending on a processthat is rarely done.We backup but we never restore.We have UPS/Genneratot but we’ve never tested itWe’ve got DRP but it’s too difficult/dangerous to execute it.
  • No matter how much you believe things can be tracked there is always something that either can’t be tracked, can’t be predicted or is simply unknown.Deterministic reference.
  • K.I.S.S – rollback changes are usually made after a production changes fails, when the team is at a low, often tired, often frustrated, often angry.
  • Return the system to a known good state removing any erroneous transactions from the systems
  • Return the system to a known good state removing/correcting any erroneous transactions from the systems AND return the system to working order as fast as possible.Are these different? Contradictory?
  • Dev / Ops / QA
  • Dev / Ops / QA
  • http://www.slideshare.net/mmalone/architecture-at-simplegeo-staying-agile-at-scaleIf your system is hard to deploy or you can’t upgrade without org risk then that’s an architectural problem NOT an operational one
  • http://www.slideshare.net/mmalone/architecture-at-simplegeo-staying-agile-at-scale
  • Disrupt
  • Continuous deployment on end of spectrum – other end is more small change rather than big bang change.If it hurts do it more until it stops hurting
  • Accept failure, learn from it, move forward not backwards, you are going to have to deploy anything you roll back now again sometime in the future.
  • Having rollback is not an excuse not to SUFFICIENTLY test
  • Under Siege 2.Don’t assume the past dictates the futureLess NIH and religion – more science and data
  • Poet John Lydgate – ably stolen by Abraham Lincoln“You can please some of the people all of the time, you can please all of the people some of the time, but you can’t please all of the people all of the time”.
  • Or worth the effort.
  • Don’t lie to yourself.from ”the sometimes emotional standpoints that bind system administrators to the notion of rollback: desperately wanting does not make it possible”Thank you.

Rollback: The Impossible Dream Rollback: The Impossible Dream Presentation Transcript

  • RollbackThe Impossible Dream by James Turnbull jamtur01 @ github kartar @ twitter jamesturnbull on freenode james @ puppetlabs.com
  • About MeVP Technical Operations at Puppet Labs Puppet guy Ruby guy Talks funny
  • A show of hands
  • Who thinks theyknow what rollback is?
  • Last set of hands
  • YMMV
  • Definitions
  • Traditional
  • Modern
  • Fact or Fiction?
  • Accept certain constraints
  • Constraint #1Apply sufficient capital
  • Constraint #2 Idempotent
  • Constraint #3Cascade-less failure
  • Constraint #4 Resources
  • A Philosophical Digression
  • If I know where I amI don’t know how I got there If I know how I got there I don’t know where I am
  • Very few “systems”are truly deterministic
  • A Mathematical Digression
  • On system rollback and totalised fields An algebraic approach to system change Mark Burgess and Alva Couch 20th June 2011http://cfengine.com/markburgess/papers/totalfield.p df
  • So what’s wrong with rollback?
  • Risk
  • Learning from mistakes
  • Complex systems are … complex
  • Human error
  • What is the problemrollback is trying to solve?
  • What is the problem YOU are trying to solve?
  • So how can wemitigate Rollback shortcomings?
  • Preventative Design
  • Rollback is (often) anarchitecture problem
  • Increase Resilience
  • OperationalIntelligence
  • A little bit of DevOps in every byte…
  • Small, iterative changes
  • Accept that failure happens
  • “We can’t test that? Okay we can roll it back if it breaks…”
  • Assumption is themother of all fuckups*
  • “But the system can’t be{run|upgraded|deployed} like that because…”
  • Conclusions
  • Rollback is possible but not probable
  • If you have to have “rollback” accept constraints
  • You can mitigate the need for it
  • Thank you!Questions/Insults? jamtur01 @ github kartar @ twitter jamesturnbull on freenode james @ puppetlabs.com