Successfully reported this slideshow.
Your SlideShare is downloading. ×

We have nearly one million lines of Python 2 code in production — and now?

Ad

Michael Howitz, gocept
We have nearly
one million lines of Python 2 code
in production — and now?
Image by sipa from Pixab...

Ad

Michael Howitz
Introduction
• since 2003
* at gocept gmbh & co kg.
* in Halle (Saale), DE
* Python developer

Ad

Agenda
• Discussion of approaches
• Introduction into union.cms
• Doing the migration
• Tips and tricks

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Loading in …3
×

Check these out next

1 of 19 Ad
1 of 19 Ad

We have nearly one million lines of Python 2 code in production — and now?

Download to read offline

Still running Python 2 code in production is like steering a ship without radar in thick fog: You don’t know, which obstacle you will hit next. But there are ways to see the sun again even for large code bases. This presentation contains a discussion of the possible ways and for a success story.

Still running Python 2 code in production is like steering a ship without radar in thick fog: You don’t know, which obstacle you will hit next. But there are ways to see the sun again even for large code bases. This presentation contains a discussion of the possible ways and for a success story.

Advertisement
Advertisement

More Related Content

Advertisement

We have nearly one million lines of Python 2 code in production — and now?

  1. 1. Michael Howitz, gocept We have nearly one million lines of Python 2 code in production — and now? Image by sipa from Pixabay
  2. 2. Michael Howitz Introduction • since 2003 * at gocept gmbh & co kg. * in Halle (Saale), DE * Python developer
  3. 3. Agenda • Discussion of approaches • Introduction into union.cms • Doing the migration • Tips and tricks
  4. 4. • Sunset the application • Replace by something off the shelf • Don’t migrate • Start over / Fade out • Move to Python 3 Discussion of approaches
  5. 5. Possible approaches Sunset the application Image by Pexels from Pixabay
  6. 6. Possible approaches Replace by something off the shelf Image by Michal Jarmoluk from Pixabay
  7. 7. Possible approaches Don’t migrate Image by Lynn Greyling from Pixabay
  8. 8. Possible approaches Start over Image by Lynn Greyling from Pixabay
  9. 9. Possible approaches Move to Python 3 Image by Marcus Bouvin from Pixabay
  10. 10. union.cms ☞ Migrate to Python 3
  11. 11. Steps towards running on Python 3
  12. 12. Migration steps I. General preparation • Know your tools: Python 2 and 3 and the differences • Have a good test coverage. • Type annotations might help.
  13. 13. Migration steps II. Clean up code and tests • Make sure all tests run and pass • Find out which code parts are still used, delete the rest
  14. 14. Migration steps III.Bring dependencies to Python 3 • List all Python package dependencies • Check each dependency for Python 3 compatibility • Update to newest versions of dependencies • Replace or port not yet Python 3 compatible ones • Create a dependency tree of your packages * tl.eggdeps
  15. 15. Migration steps IV.Migrate the code • Run python-modernize (package is called modernize.) • Run pylint --py3k • Fix the tests. * pytest-quarantine * Make sure, the tests still run on Python 2. • Mark code with # PY2 resp. # PY3.
  16. 16. Migration steps V. Switch to Python 3 • Run instance against empty database. • Maybe migrate the data. • Test the application in depth from end to end. • Rollout on Python 3. • Drop Python 2 support. * pyupgrade
  17. 17. • Use metrics to compute the effort * pylint warning: 30 min * doctest: 20 min * Python module: 5 min * modernized Python module: 2 min * base effort per package: 135 min • Plan the migration • Rollout often Schedule
  18. 18. • Target to a recent Python 3 version • No long running branches • Always do small steps • Take the time it needs Tips and tricks
  19. 19. • Python 2 End Of Life Survey from ActiveState * https://www.activestate.com/resource s/white-papers/python-2-eol-survey- results/ • union.cms consumers * https://www.dgb.de * https://www.verdi.de • My employer * https://gocept.com/index_en.html • Migration helpers * https://pypi.org/project/mypy/ * https://pypi.org/project/modernize/ * https://pypi.org/project/pylint/ * https://pypi.org/project/pytest- quarantine/ * https://www.stxnext.com/blog/python- 3-migration-guide/ * https://portingguide.readthedocs.io * https://pypi.org/project/pyupgrade/ * https://pypi.org/project/tl.eggdeps/ Links

Editor's Notes

  • Created in May/June 2020.
  • Since 2003 I am a Python software developer.
    I was the head of development in the migration project my presentation is based on.
    My employer is a small software and consulting company named gocept. It is located in Halle (Saale), Germany.
  • What to do with an application running on Python 2?
  • Python 2 is after its end of life maybe your application, too. Ask yourself:
    How important is the business case it supports?
    How long will I or my company earn money using this business case?
    Can a migration be a success — even financially?
    This approach might be the cheapest and most save one — if it is possible at all.
    But you should at least think about it.
    Don’t burn money by touching an application which is not (or no longer) worth investing into it.
  • Is your application so special that it cannot be replaced? maybe you can even replace it by something off the shelf
    It is probably cheaper to buy a service from a company than doing everything yourself.
    On the other hand you will mostly get only an 80% solution — maybe you can work around the missing 20%.
    One of our customers is replacing big parts of their homegrown ERP system by products backed by companies: bug tracker, internal CMS, calendar, …
    You only have to migrate the data – I know this can cause big trouble, but you only have to do it once.
    Even my company — as a software shop — has a slogan which roughly translates into English: „Better buy it instead of develop it on your own.“
  • The next approach might be dangerous „Don’t migrate“
    If the use case of your application will go away soon, neither maintenance nor development of new features is required and if there are no bugs to fix: this approach might be a fit for you.
    But be careful! Don’t use this approach on critical infrastructure which ran for years and nobody want’s to take care of!
    What will happen if the server running the application dies?
    Will you be able to find help if you have problems with the code in the next 2 years?
    So be aware it will become harder to keep an old application running and to find someone to invest time into it.
  • If all is in ruins maybe „Start over“
    Directly in Python 3 or even in another language?
    Try to keep at least the high level tests?
    There are many people who say: „Don’t do this“ — and I am one of them:
    It might be interesting to start on new ground.
    In the beginning it looks like a great opportunity to do everything correct now — like you read it in the book.
    Often this will become a great nightmare: At least for big enough projects.
    You can do this for a micro service but not for a monolith application.
    There might be exceptions to this rule of thumb — but why do you think you are the exception?
    Rewrite projects are the ones which rarely succeed:
    Only rewriting is never enough: the new application should also be a lot more flexible and have a ton of new features
    You still have to migrate the data — or do you dare to use the same old database?
    During the rewrite the old code has to be maintained:
    change requests now have to be implemented in both variants, the old and the new one
    Users will not be happy with the new application because it behaves differently.
    The old one they used to complain about will over time become the secret weapon to get things done.
    The minimal viable product will be quite big especially if there is an old working version the users know and trust.
    There is a sub-approach to „Start over“ named „Fade out“:
    replace the parts of the old monolith one by one
    This might work.
    But wouldn’t it be easier if the monolith already runs on a current Python version?
    So you don’t have to think in different Python versions if you have to work on the monolith and its successors.
  • If nothing else helps: You could even try to move your code and tests over to Python 3.
    For me this is the most interesting approach, because I love legacy code.
    Of all the approaches this is most future proof one:
    It uses what’s already running in production and only transforms it.
    This can be challenging — but until now it was possible for each project we touched as company.
  • Let me introduce union.cms.
    Union.cms is a content management system once written for ver.di, a German trade union.
    Its first version went into production as early as in 2003.
    It is now also used by DGB (aka „Deutscher Gewerkschaftsbund“).
    Union.cms is a multi-site, multi-user content management system.
    It has thousands of content editors and serves its content to some million visitors.
    Sorry, its source code is not publicly available.
    When we first touched it in 2009 it was based on Zope 2 and the Zope Object Database (ZODB).
    It was probably running on Python 2.6, as Python 2.7 was not yet finally released those days.
    Over the years we developed many new features for the CMS — so finally ended up having nearly one million lines of Python code.
    The customers are interested to keep the CMS running and maintainable for the next coming years.
    So there was no way to sunset the application.
    Do not migrate was also a no-go — because of security considerations.
    Replacing it or starting over was no opportunity.
    The customers invested much the time and effort in the previous years:
    to train the editors
    and to create or migrate the content from an older union.cms version.
    So we were happy to migrate the core and the projects built upon — to Python 3.
  • Now let’s take a look at the migration steps which worked well for union.cms.
    I believe they can be used in Python 3 migration projects of all sizes.
    I am going to present 5 main steps.
  • Step 1 of the migration is general preparation — This means:
    Have someone who knows Python 2 and Python 3 and the differences in between.
    There are enough tutorials on the web, so I am not going to cover this here.
    It deeply helps to have a decent test coverage.
    Yes, this means to have to port the tests, too; but test coverage gives you confidence that the code still runs.
    In union.cms we currently have about 88% test coverage of the Python code.
    When we started, union.cms had less test coverage: it increased over time during the migration project.
    Sorry I do not have the actual numbers from when we started.
    Currently the 88% are quite good enough.
    Annotating each function with the expected types for the input and output (aka type annotations) might help.
    Checkers like mypy are able to find problems by statically analyzing your code.
    In Python 2 type annotations have to be comments instead of being part of the language.
    In union.cms we did not use type annotations for the migration:
    It seemed too much hassle for an unknown gain.
  • Step 2 of the migration: Clean up code and tests — This means:
    All tests should be running and passing.
    Broken or disabled tests are generally no good idea.
    You should try to fix them or delete them if nothing helps.
    Some parts of the code are probably unused, you should not port them.
    The ones used will be difficult enough to port, so there is no need to work on dead code.
    In union.cms this required a lot of detective work to find code which is no longer needed, there were
    Python modules which where not even imported (no .pyc files)
    Classes and functions which where not used anywhere.
    We had the advantage to know the code base relatively well to find that dead code. — at least hopefully most of it
    Removing dead code should be done like a surgeon.
    Remove everything the dead code needs (imports, global variables, registrations, et cetera)
    Each deleted line does not need to be ported and supported later on.
  • Step 3 of the migration: Bring dependencies to Python 3 — This means:
    You need a list all direct and transitive Python package dependencies of your application.
    Transitive dependencies are the dependencies of the dependencies.
    How to get this list, depends on the tools you use, only two examples:
    If you are using `pip` to install your dependencies call `pip freeze` to get this list.
    union.cms uses `zc.buildout`. It lists all the needed dependencies in the generated run scripts.
    Each dependency has to be checked for Python 3 compatibility individually:
    This means look at the Python Package Index (aka PyPI) for the package to see if there is a version which declares Python 3 support.
    I know, this step could be automated — but looking at a package gives you a feeling for the package:
    When was the latest release?
    Does the package still seem to be actively developed?
    Which version is the last one which supports both Python 2 and 3?
    On the next slide I am going to suggest to port to Python 3 but keeping Python 2 compatibility for at least a while.
    This step could be used to introspect the stack you are building upon.
    Possibly you will find packages which should be replaced or have a maintained successor package.
    At the end of this step you have a snapshot of
    the versions you are using
    the package versions needed for Python 3 compatibility
    and maximum package version numbers to keep the Python 2 compatibility
    When you know the needed versions you can use them:
    Make sure you run on the newest versions which are Python 2 compatible.
    This might require some changes in your code so it works with these newer versions.
    This of course also means: run on the latest version of Python 2.7!
    There might be dependencies which are not yet ported to Python 3, you could try to replace them with other packages.
    Sometimes you are the chosen one to port a package to Python 3.
    You could see this as an exercise for the migration of your application.
    Maybe it is enough to port the parts of the package you actually use.
    To get a clear plan where to start porting your application — you might need a dependency tree of the packages belonging to the application.
    It should at least exist in your head.
    In union.cms we knew the dependencies well enough, so there was no need for an explicit dependency graph.
    Unfortunately I cannot suggest a good tool for creating that dependency tree.
    The only one I used more than once is called tl.eggdeps … but it might not fit your use case.
  • Step 4 of the migration: Migrate the code — There are several tools to help you:
    „modernize“ is a package created by Armin Ronacher (the developer of flask and many other python packages)
    It is used to convert Python 2 code so it can run both on Python 2 and 3.
    To achieve this it adds a dependency to a library named „six“ (aka 2 times 3 is 6).
    Depending on this „six“ library makes it pretty easy to drop Python 2 support later on; as the code needed for compatibility is marked by using this library.
    Although this step automatically changes the code we had little problems with the changes it created.
    We were happy about this first step running automatically
    as it was clear that the next ones would require manual work.
    There are problems which cannot be automatically fixed but they can be detected,
    for instance in Python 3 a class providing the method __eq__ (for equality comparison) also has to implement __hash__.
    Pylint has a mode to find such issues:
    You need to install a pylint version older than version 2 on Python 2. (Newer versions are Python 3 only and no longer contain the needed checkers.)
    It will tell you about some problems not even your tests will find — but they can cause trouble when running the application.
    Now the heavy lifting follows:
    Run the tests on Python 3 and fix them until they pass.
    There might even be import errors which already disturb the test collection.
    Many of them will be due to relative imports which are no longer supported on Python 3.
    Running the tests will show up all the problems „modernize“ and „pylint“ could not find, e. g.:
    StringIO vs. BytesIO
    Ask the web search engine of your choice, for specific problems, there are already solutions for most of them.
    I added two example pages in the links section on the last page of the presentation.
    If you are using „pytest“ to run your tests — the plug-in „pytest-quarantine“ can come handy:
    It allows to curate a list of failing tests:
    This way you always know which tests you still have to fix.
    And you see if you broke tests by your changes which where successful before.
    You are even able to commit the quarantine list to your repository so multiple developers can work on fixing the tests.
    Make sure the tests still pass on Python 2, so you are always able to deploy to production.
    In union.cms it was really helpful to add comments to the code branches only needed for a specific Python version using „# PY2“ respectively „# PY3“.
    This made it easier — later on — to remove Python 2 compatibility code.
  • Step 5 of the migration: Switch to Python 3 — This means:
    Run your application instance against an (at least fairly) empty database.
    This way you make sure the instance starts at least.
    You might find some unicode vs. bytes issues while clicking through the application.
    Maybe you should write some additional tests for the problems you find.
    When the application runs — it might be time to migrate the data:
    union.cms uses ZODB — which stores Python pickle objects.
    If you are using a relational database you are probably lucky and able to skip this step.
    Now you can deploy the application to a staging environment to test in depth from end to end.
    Even with 100% test coverage some issues might slip through.
    union.cms was tested by the editors who use it in their day job.
    They even wrote a test manual for an external tester to test it on the different installations.
    Be aware that the testers will find bugs the automatic tests did not find.
    Additionally they will find bugs which already existed before the Python 3 migration was started.
    After a successful test on staging it is time for the rollout to production:
    Maybe you can do a canary rollout: This means
    Let only a small fraction of your users hit the server running on Python 3.
    Fix the occurring errors.
    And increase the portion of users which see the Python 3 version over time up 100%.
    In union.cms this approach was not possible — as the database had to be converted to be usable on Python 3.
    So we stopped editing for a weekend, converted a copy of the database and went back online after the conversion.
    The rollout went smooth — thanks to the many testers of the staging environment.
    There is one final step: Drop the Python 2 support.
    An application can only run a single Python version.
    So after a successful deployment on Python 3 the Python 2 compatibility code is no longer needed.
    It seems to be the best way to drop support for Python 2 in a coordinated way instead of when the code is touched next time.
    There is a tool called „pyupgrade“ which can be used to remove most parts where the library „six“ is used.
    Additionally it converts code to be modern Python 3 including the usage of f-strings.
    After running this tool you can remove the imports of „six“ and clean up the places you marked with PY2 or PY3.
    Don’t underestimate this step, even reading through the diff when removing Python 2 compatibility is a lengthy task.
    Don’t forget to update the dependencies to the newest Python-3-only versions.
  • Some words about the time schedule:
    We used the following metrics to compute the effort for porting union.cms to Python 3:
    We ran python-modernize and afterwards pylint (without checking in)
    For each warning pylint raised we calculated 30 minutes to fix it because these are the hard things.
    In most cases we needed less than the calculated 30 minutes.
    For each doctest we had we calculated 20 minutes.
    Some tests had to be rewritten as unittests to support both Python versions.
    These tests took more time than 20 minutes.
    5 minutes for each Python module including the test modules
    2 minutes for each Python module python-modernize had touched — to maybe fix the mess the tool created
    Plus 135 min base effort per installable Python package (aka a package with a setup.py which can be installed via pip)
    These numbers added up to a budget which was more than enough to migrate the code.
    We created a plan for the whole migration: we planned 19 working phases:
    One iteration each month with up to two weeks of implementation and testing — so there was still time for bugfixes and new features outside the migration project.
    Be aware this was not only the Migration from Python 2 to Python 3 but also from Zope 2.13 to Zope 4 as the main dependency.
    Additionally we fixed some technical debt as we were working on the code.
    Actually we deployed a version running on Python 3 after the 15th iteration, so we were a bit faster than expected.
    We actually planned to rollout after each iteration but the customer was not able to test the whole CMS that often.
    So we had a rollout every 3 months.
    Each rollout was shipping the current status of the Python 3 migration as it still was Python 2 compatible.
    Even the big first rollout on Python 3 could have been switched back to Python 2 if it would have failed utterly on Python 3 — so the bugfixes would have been deployed but on Python 2.
  • Now I am going to give you some tips and lessons learned:
    The migration project might take a while:
    Aim to use a recent Python 3 version, so you do not have to start the next small migration project the day after the rollout.
    Python 3.7 is a good starting point. Newer Python versions might make it a bit harder to still support Python 2.
    Merge often to the main branch.
    Your code will run on Python 2 and 3, so there is no need to keep it separate from the main branch.
    Go the route of the small steps:
    Your project might seem small enough to directly migrate to Python 3 without the intermediate step of supporting both versions:
    We did this in an other seemingly small project and gut burned:
    The changes of updating the dependencies and to Python 3 together where too much to get it running.
    We had to add back Python 2 compatibility (caution there are no tools!) to analyze the problems in Python 2 and then port to Python 3 again.
    The Python 3 migration of union.cms took about 1.5 years.
    We could have done the migration in a much shorter time but we wanted to be sure that
    it was always in a deployable state
    features and bugfixes should always be possible — without breaking the already achieved migration progress.
    If you are still running Python 2.7 in production — it should not matter if you keep doing so for some extra month to gain the extra certainty that the migration will not break everything into pieces.

×