This is a presentation given to Caixa Magica employees as a preview of what will be shown at FOSDEM, Sunday, February 7th 2010. It is subject to change and is illustrative of what will be shown at the conference.
How to Troubleshoot Apps for the Modern Connected Worker
Transactional Roll-backs and upgrades [preview]
1. Transactional Roll-backs and Upgrades John Thomson: [email_address] Researcher Paulo Trezentos: [email_address] http://twitter.com/PauloTrezentos Partner / Technical Director Monday 1 st February [preview] [Sunday, 7 th February 2010] Presented by:
18. Next slides -> One of the key projects that we are working on is the MANCOOSI project. Working on various aspects of Package Upgrade problems including solvers, distribution independent meta-data as well as Transactional Roll-back that I will be discussing.
21. Work with a multitude of top-tier Universities and research institutions.
22.
23. Jeff Johnson will present Transactionally Protected Package Management for @rpm5.org implementation of roll-back.
24. Stefano Zacchiroli will present Cross-distro dependency resolution as part of the work for MANCOOSI in a different stream.
25. Aim is to investigate package upgrades on computer systems and to develop a framework from which roll-back and pre-analysed upgrade plans are possible.
28. Many other mechanisms out there that work on using file system snapshots/saving the state, (next slide).
29. The mechanism is one part of Transactionally Protected Package Management that Jeff Johnson will speak about in his presentation later. Our method for allowing roll-back is one part of a much bigger mechanism that allows for deterministic system configurations. Installation Timeline PkgFoo v 1.00 Time: 10.00pm PkgFoo v 2.00 Time: 10.20pm Roll-back, possible? S1 S2
32. makes manipulating config files from the shell much easier and possible through other language bindings.
33. ZFS, used by Nexenta is an example of a file-system, snapshot mechanism that uses the storage available to snapshot several system states.
34. NixOS is a non-LSB based system that re-thinks how files and resources are used to try and make all files purely functional and so don't require installing per-se.
35. Other mechanisms e.g. etckeeper being developed by Fedora also try and capture configuration files into a VCS.
36.
37. The idea of roll-back is squarely placed against the idea that programs and their maintainers improve upon packages in each iteration. Downgrade is seen as a negative process.
38. If it was neccessary it would have already have been done, or would it?
39. Rolling-back changes is 'only' needed when a package fails to work on the system, so a better dependency and conflict checker is more important than roll-back?
40. There may be cases where roll-back is impossible using the techniques that we have investigated, or possibly at all.
41.
42. By examining the current maintainer scripts and templates provided by programs such as deb-helper and rpm-helper we have defined a language that can assist which cannot be addressed by current meta-installers or maintainer scripts.
43. Transactionaly Protected Package Management (TPPM) is what we are aiming to move towards. Presentation of same topic name by Jeff Johnson at 2:45pm, in the same room. roll-back TPPM roll-back
44.
45. In our case, the DSL is focused on analysing package maintainer scripts and detecting how they interoperate on a system that we have modelled as well.
46. We designed the DSL not to be a Turing Complete Language like BASH but rather something where we can focus on particular details we wish to examine.
47. The DSL is a language designed to capture the details of the vast majority of common maintainer scripts and then to be refined with subsequent versions and to increase coverage.
48. We wish to capture the functional aims of a large number of maintainer scripts and to improve coverage until cases where DSL will work is the norm.
63. Example continued, Log dslID TID parentID DSL_CMD bhINVERSE 1 1 1 start postinst_init(cups) TRUE 2 1 1 end postinst_init(cups) TRUE 3 1 1 start post_init_restart(cups) TRUE 4 1 1 end post_init_restart(cups) TRUE rbHist id parent op pkgName pkgVer1 pkgVer2 dateTime 1 1 inst cups 0 1.4.2 2010-01-30 pkgHist
64.
65. Transaction has quite obviously failed. No matching end for a DSL command reached. Odd number of elements etc.
66. Perform a roll-back for all matching sub-transaction ID elements, but in the reverse order with certain constraints.
67. If a set of script elements cannot perform roll-back in the middle of operating, then don't create a dsl tag. dslID TID parentID DSL_CMD bhINVERSE 1 1 1 start postinst_init(cups) TRUE 2 1 1 end postinst_init(cups) TRUE 3 1 1 start post_init_restart(cups) TRUE
68.
69. By creating a model of the system in terms of the new language and by representing the changes in the state of the system performed by package upgrades in terms of DSL we aim to be able to a-priori investigate the target configuration.
70. If a package upgrade fails, at that moment, we leave our system potentially in an un-known state and run the risks of having an inconsistent amount of files in an upgraded state.
71. We propose a hybrid mechanism where we use DSL to monitor the package configuration state and if that does not work, to revert back to a system-call trapping mechanism.
72.
73.
74.
75. Modifying apt-rpm to include roll-back features. DSL approach uses many new elements built into apt-rpm. We first want to check if the simulator that is possible using our new approach, detects whether or not there is likely to be a package configuration failure. Even if the simulator does not detect a failure it does not mean that the actual configuration will fail on the system. This is a compromise taken to abstract from the system in the model. Next we replace the traditional configuration script running, which is run by an agnostic meta-installer and instead run our DSL commands. By keeping our commands in a log and knowing how the system modified we should be able to perform roll-back. model_simulator ( ) DSL_interpreter_pre ( ) run_transaction () Apt DSL_interpreter_post ( )
76. System Integration For executing the roll-back statements we will have a log of the DSL commands executed in-sequence. To perform the roll-back we need to run the inverse statements associated with those commands in the reverse order. The reason for having the simulator at this stage is to pre-check that if the package configuration can be rolled-back whether it will leave an erroneus state. As we upgraded from that state we hope in most cases the answer will be that it is possible. As we are performing a LIFO style roll-back we run post commands before we run the pre statements. model_simulator ( ) DSL_reverse_post ( ) run_transaction () Apt DSL_reverse_pre ( )
80. Use these common elements as the basis for a first version of the DSL and release that version.
81. Modify standard maintainer scripts to include DSL commands -> link to binary files or some other mechanism. We have chosen to add dsl commands into the modified .spec files.
82. Log DSL elements into a SQLite database so that they can be captured, replayed or otherwise analysed.
83. Develop a roll-back mechanism that uses the log + stored info in the VCS to recover the original state of the machine -> ACID?