RollbackThe Impossible Dream    by James Turnbull     jamtur01 @ github       kartar @ twitter jamesturnbull on freenode  ...
About MeVP Technical Operations at Puppet Labs             Puppet guy              Ruby guy             Talks funny
A show of hands
Who thinks theyknow what rollback       is?
Last set of hands
YMMV
Definitions
Traditional
Modern
Fact or Fiction?
Accept certain constraints
Constraint #1Apply sufficient    capital
Constraint #2 Idempotent
Constraint #3Cascade-less failure
Constraint #4 Resources
A Philosophical  Digression
If I know where I amI don’t know how I got there   If I know how I got there   I don’t know where I am
Very few “systems”are truly deterministic
A Mathematical  Digression
On system rollback and totalised fields     An algebraic approach to system change          Mark Burgess and Alva Couch    ...
So what’s wrong with     rollback?
Risk
Learning from  mistakes
Complex systems are   … complex
Human error
What is the problemrollback is trying to       solve?
What is the problem YOU are trying to      solve?
So how can wemitigate Rollback shortcomings?
Preventative  Design
Rollback is (often) anarchitecture problem
Increase Resilience
OperationalIntelligence
A little bit of DevOps in       every byte…
Small, iterative  changes
Accept that failure    happens
“We can’t test that? Okay we can roll it back if it breaks…”
Assumption is themother of all fuckups*
“But the system can’t           be{run|upgraded|deployed} like that because…”
Conclusions
Rollback is possible but not probable
If you have to have  “rollback” accept     constraints
You can mitigate the     need for it
Thank you!Questions/Insults?     jamtur01 @ github       kartar @ twitter jamesturnbull on freenode  james @ puppetlabs.com
Upcoming SlideShare
Loading in...5
×

Rollback: The Impossible Dream

1,921
-1

Published on

Roll back doesn’t exist. It’s not real. It’s a fantasy, a dream, a delusion. Any vendor who tells you they have a roll back capability is lying to you. And lying to you in a downright dangerous way that will come back to haunt you at 4am in a war room when someone says:

“We can’t fix this. Let’s roll back the deployment.”

This talk is designed to explain and demonstrate to Operations staff:

Why roll back is a fantasy and explained with a dash of Werner Heisenberg
Why it is dangerous and how you can recognize when you’re about to get trapped
How you can avoid falling into that trap of considering it an appropriate compensating control.
It’ll also explain what you can actually do operationally instead of “rolling back”. This will cover other alternative compensating controls that can help you get running again and resolve your outage whilst still allowing you to find root cause.

Published in: Technology, Spiritual
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,921
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Dev / Ops / QA / Security? / Others?
  • ??
  • Does anyone rely on rollback?
  • This is very much opinion based on experience. Everyone’s shop is different – everyone has different constraints and requirements.A trading house differs from a twitter analytics company differs from a hospital from a .gov/Fed.Distance the discussion from ”the sometimes emotional standpoints that bind system administrators to the notion of rollback: desperately wanting does not make it possible”But every shop has technical heritage and technical debtEstablished institutional memory/remembered painApproach with an open mind and don’t make assumptionsWelcome new ideas and evaluate old constructsYou don’t have to agree / you can think I am a clueless idiot – as long as you do so based on clear, established data not “we’re a different special snowflake” because you’re fucking not.
  • Changed my views a little since writing the abstract.
  • Trad/Modern – arbitrary labelsDatabase rollback, transactional rollbackIn single-threaded and parallel software applications, many authors have developed a ‘journaling’ approach to reversibility and rollback (see foregoing references on checkpointing). A stackof state-history can be kept to arbitrary accuracy (and at proportional cost), provided there is sufficient memory to document changes.
  • Service rollback. Many interconnecting components.Interconnectness between application(s) and infrastructure changesRelease management, Checkpointing, Snapshots and version control In more general ‘open’ (or incompletely specified) systems the cost of maintaining history increases without bound as system complexity increases.
  • Rollback isn’t a myth – for certain definitions in certain circumstances it MAY be possible to do something that resembles a rollback.
  • Roll-back recovery requires that the operations between the checkpoint and the detected erroneous state can be made idempotent.
  • Apply enough money and set enough constraints and you can have something like rollback.Duplicate infrastructure / scale
  • Roll-back recovery requires that the operations between the checkpoint and the detected erroneous state can be made idempotent.
  • A cascading rollback occurs in database systems when a transaction (T1) causes a failure and a rollback must be performed. Other transactions dependent on T1's actions must also be rollbacked due to T1's failure, thus causing a cascading effect. That is, one transaction's failure causes many to fail.Practical database recovery techniques guarantee cascadeless rollback, therefore a cascading rollback is not a desirable result.
  • You must have sufficient memory/storage/resources to maintain sufficient history to rollback to a specified point
  • Story about University and “You are here” signs. Promised Heisenberg – uncertainty principle lower bound on the precision on which certain pairs of properties of particles can be measured (location / speed). The closer you measure one the harder it is to measure the other. Observer principle – observing things actually resulting in making it hard to measure them.
  • Story about University and “You are here” signs. Promised Heisenberg – uncertainty principle lower bound on the precision on which certain pairs of properties of particles can be measured (location / speed). The closer you measure one the harder it is to measure the other. Observer principle – observing things actually resulting in making it hard to measure them
  • A deterministic system is one in which no randomness in the development of future states of the system. Lessons learnt about Complex systems and systems thinking.
  • I have a Liberal Arts degree and got someone sciency and smart to explain the hard bits to me.
  • Risk – false sense of security
  • Unless you are committed to testing 'rollback' on a regular basis,maybe even every deploy, you inevitably end up in a situation where atthe worst possible moment you are going to be depending on a processthat is rarely done.We backup but we never restore.We have UPS/Genneratot but we’ve never tested itWe’ve got DRP but it’s too difficult/dangerous to execute it.
  • No matter how much you believe things can be tracked there is always something that either can’t be tracked, can’t be predicted or is simply unknown.Deterministic reference.
  • K.I.S.S – rollback changes are usually made after a production changes fails, when the team is at a low, often tired, often frustrated, often angry.
  • Return the system to a known good state removing any erroneous transactions from the systems
  • Return the system to a known good state removing/correcting any erroneous transactions from the systems AND return the system to working order as fast as possible.Are these different? Contradictory?
  • Dev / Ops / QA
  • Dev / Ops / QA
  • http://www.slideshare.net/mmalone/architecture-at-simplegeo-staying-agile-at-scaleIf your system is hard to deploy or you can’t upgrade without org risk then that’s an architectural problem NOT an operational one
  • http://www.slideshare.net/mmalone/architecture-at-simplegeo-staying-agile-at-scale
  • Disrupt
  • Continuous deployment on end of spectrum – other end is more small change rather than big bang change.If it hurts do it more until it stops hurting
  • Accept failure, learn from it, move forward not backwards, you are going to have to deploy anything you roll back now again sometime in the future.
  • Having rollback is not an excuse not to SUFFICIENTLY test
  • Under Siege 2.Don’t assume the past dictates the futureLess NIH and religion – more science and data
  • Poet John Lydgate – ably stolen by Abraham Lincoln“You can please some of the people all of the time, you can please all of the people some of the time, but you can’t please all of the people all of the time”.
  • Or worth the effort.
  • Don’t lie to yourself.from ”the sometimes emotional standpoints that bind system administrators to the notion of rollback: desperately wanting does not make it possible”Thank you.
  • Rollback: The Impossible Dream

    1. 1. RollbackThe Impossible Dream by James Turnbull jamtur01 @ github kartar @ twitter jamesturnbull on freenode james @ puppetlabs.com
    2. 2. About MeVP Technical Operations at Puppet Labs Puppet guy Ruby guy Talks funny
    3. 3. A show of hands
    4. 4. Who thinks theyknow what rollback is?
    5. 5. Last set of hands
    6. 6. YMMV
    7. 7. Definitions
    8. 8. Traditional
    9. 9. Modern
    10. 10. Fact or Fiction?
    11. 11. Accept certain constraints
    12. 12. Constraint #1Apply sufficient capital
    13. 13. Constraint #2 Idempotent
    14. 14. Constraint #3Cascade-less failure
    15. 15. Constraint #4 Resources
    16. 16. A Philosophical Digression
    17. 17. If I know where I amI don’t know how I got there If I know how I got there I don’t know where I am
    18. 18. Very few “systems”are truly deterministic
    19. 19. A Mathematical Digression
    20. 20. On system rollback and totalised fields An algebraic approach to system change Mark Burgess and Alva Couch 20th June 2011http://cfengine.com/markburgess/papers/totalfield.p df
    21. 21. So what’s wrong with rollback?
    22. 22. Risk
    23. 23. Learning from mistakes
    24. 24. Complex systems are … complex
    25. 25. Human error
    26. 26. What is the problemrollback is trying to solve?
    27. 27. What is the problem YOU are trying to solve?
    28. 28. So how can wemitigate Rollback shortcomings?
    29. 29. Preventative Design
    30. 30. Rollback is (often) anarchitecture problem
    31. 31. Increase Resilience
    32. 32. OperationalIntelligence
    33. 33. A little bit of DevOps in every byte…
    34. 34. Small, iterative changes
    35. 35. Accept that failure happens
    36. 36. “We can’t test that? Okay we can roll it back if it breaks…”
    37. 37. Assumption is themother of all fuckups*
    38. 38. “But the system can’t be{run|upgraded|deployed} like that because…”
    39. 39. Conclusions
    40. 40. Rollback is possible but not probable
    41. 41. If you have to have “rollback” accept constraints
    42. 42. You can mitigate the need for it
    43. 43. Thank you!Questions/Insults? jamtur01 @ github kartar @ twitter jamesturnbull on freenode james @ puppetlabs.com
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×