Selena Deckelmann - Sane Schema Management with Alembic and SQLAlchemy @ Postgres Open

2,119 views

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,119
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
26
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Selena Deckelmann - Sane Schema Management with Alembic and SQLAlchemy @ Postgres Open

  1. 1. Sane Schema Management with Alembic and SQLAlchemy Selena Deckelmann Mozilla @selenamarie chesnok.com
  2. 2. I work on Socorro. http://github.com/mozilla/socorro http://crash-stats.mozilla.com
  3. 3. Thanks and apologies to Mike Bayer
  4. 4. What's sane schema management? Executing schema change in a controled, repeatable way while working with developers and operations.
  5. 5. What's alembic? Alembic is a schema migration tool that integrates with SQLAlchemy.
  6. 6. My assumptions: ● Schema migrations are frequent. ● Automated schema migration is a goal. ● Stage environment is enough like production for testing. ● Writing a small amount of code is ok.
  7. 7. No tool is perfect. DBAs should drive migration tool choice. Chose a tool that your developers like. Or, don't hate.
  8. 8. Part 0: #dbaproblems Part 1: Why we should work with developers on migrations Part 2: Picking the right migration tool Part 3: Using Alembic Part 4: Lessons Learned Part 5: Things Alembic could learn
  9. 9. Part 0: #dbaproblems
  10. 10. Migrations are hard. And messy. And necessary.
  11. 11. Changing a CHECK constraint on 1000+ partitions. http://tinyurl.com/q5cjh45
  12. 12. What sucked about this: ● Wasn't the first time (see 2012 bugs) ● Change snuck into partitioning UDF Jan-April 2013 ● No useful audit trail ● Some partitions affected, not others ● Error dated back to 2010 ● Wake up call to examine process!
  13. 13. Process before Alembic:
  14. 14. What was awesome: ● Used Alembic to manage the change ● Tested in stage ● Experimentation revealed which partitions could be modified without deadlocking ● Rolled out change with a regular release during normal business hours
  15. 15. Process with Alembic: 1. Make changes to model.py or raw_sql files 2. Run: alembic revision –-auto-generate 3. Edit revision file 4.Commit changes 5. Run migration on stage after auto-deploy of a release
  16. 16. Process with Alembic: 1. Make changes to model.py or raw_sql files 2. Run: alembic revision -–auto-generate 3. Edit revision file 4.Commit changes 5. Run migration on stage after auto-deploy of a release
  17. 17. Problems Alembic solved: ● Easy-to-deploy migrations including UDFs for dev and stage ● Can embed raw SQL, issue multi- commit changes ● Includes downgrades
  18. 18. Problems Alembic solved: ● Enables database change discipline ● Enables code review discipline ● Revisions are decoupled from release versions and branch commit order
  19. 19. Problems Alembic solved (continued): ● 100k+ lines of code removed ● No more post-deploy schema checkins ● Enabling a tested, automated stage deployment ● Separated schema definition from version-specific configuration
  20. 20. Photo courtesy of secure.flickr.com/photos/lambj HAPPY AS A CAT IN A BOX
  21. 21. Part I: Why we should work with developers on migrations
  22. 22. Credit: flickr.com/photos/chrisyarzab/
  23. 23. Schemas change.
  24. 24. Developers find this process really frustrating.
  25. 25. Schemas, what are they good for? Signal intent Communicate ideal state of data Highly customizable in Postgres
  26. 26. Schemas, what are they not so good for? Rapid iteration Documenting evolution Major changes on big data Data experimentation
  27. 27. Database systems resist change.
  28. 28. Database systems resist change because: Exist at the center of multiple systems Stability is a core competency Schema often is the only API between components
  29. 29. How do we make changes to schemas?
  30. 30. Because of resistance, we treat schema change as a one-off.
  31. 31. Evolution of schema change process
  32. 32. We're in charge of picking up the pieces when a poorly-executed schema change plan fails.
  33. 33. Trick question: When is the right time to work with developers on a schema change?
  34. 34. How do we safely make changes to schemas?
  35. 35. How do we safely make changes to schemas? Process and tooling. Preferably, that we choose and implement.
  36. 36. Migration tools are really configuration management tools.
  37. 37. Migrations are for: ● Communicating change ● Communicating process ● Executing change in a controled, repeatable way with developers and operations
  38. 38. Part 2: Picking the right migration tool
  39. 39. Questions to ask: ● How often does your schema change? ● Can the migrations be run without you? ● Can you test a migration before you run it in production?
  40. 40. Questions to ask: ● Can developers create a new schema without your help? ● How hard is it to get from an old schema to a new one using the tool? ● Are change rollbacks a standard use of the tool?
  41. 41. What does our system need to do? ● Communicate change ● Apply changes in the correct order ● Apply a change only once ● Use raw SQL where needed ● Provide a single interface for change ● Rollback gracefully
  42. 42. How you are going to feel about the next slide:
  43. 43. Use an ORM with the migration tool.
  44. 44. Shameful admission: We had three different ways of defining schema in our code and tests.
  45. 45. A good ORM provides: ● One source of truth about the schema ● Reusable components ● Database version independence ● Ability to use raw SQL
  46. 46. And good ORM stewardship: ● Fits with existing tooling and developer workflows ● Enables partnership with developers ● Integrates with a testing framework
  47. 47. And: ● Gives you a new way to think about schemas ● Develops compassion for how horrible ORMs can be ● Gives you developer-friendly vocabulary for discussing why ORM- generated code is often terrible
  48. 48. Part 3: Using Alembic
  49. 49. Practical Guide to using Alembic http://tinyurl.com/po4mal6
  50. 50. https://alembic.readthedocs.org revision: a single migration down_revision: previous migration upgrade: apply 'upgrade' change downgrade: apply 'downgrade' change offline mode: emit raw SQL for a change
  51. 51. Installing and using: virtualenv venv-alembic . venv-alembic/bin/activate pip install alembic alembic init vi alembic.ini alembic revision -m “new” alembic upgrade head alembic downgrade -1
  52. 52. Defining a schema? vi env.py Add: import myproj.model
  53. 53. Helper functions? Put your helper functions in a custom library and add this to env.py: import myproj.migrations
  54. 54. Ignore certain schemas or partitions? In env.py: def include_symbol(tablename, schema): return schema in (None, "bixie") and re.search(r'_d{8}$', tablename) is None
  55. 55. Manage User Defined Functions? Chose to use raw SQL files 3 directories, 128 files: procs/ types/ views/ codepath = '/socorro/external/pg/raw_sql/procs' def load_stored_proc(op, filelist): app_path = os.getcwd() + codepath for filename in filelist: sqlfile = app_path + filename with open(myfile, 'r') as stored_proc: op.execute(stored_proc.read())
  56. 56. Stamping database revision? from alembic.config import Config from alembic import command alembic_cfg = Config("/path/to/yourapp/alembic.ini") command.stamp(alembic_cfg, "head")
  57. 57. Part 4: Lessons Learned
  58. 58. Always roll forward. 1. Put migrations in a separate commit from schema changes. 2. Revert commits for schema change, leave migration commit in-place for downgrade support.
  59. 59. Store schema objects in the smallest, reasonable, composable unit. 1. Use an ORM for core schema. 2. Put types, UDFs and views in separate files. 3. Consider storing the schema in a separate repo from the application.
  60. 60. Write tests. Run them every time. 1. Write a simple tool to create a new schema from scratch. 2. Write a simple tool to generate fake data. 3. Write tests for these tools. 4.When anything fails, add a test.
  61. 61. Part 5: What Alembic could learn
  62. 62. 1. Understand partitions 2. Never apply a DEFAULT to a new column 3. Help us manage UDFs better 4.INDEX CONCURRENTLY 5. Prettier syntax for multi-commit sequences
  63. 63. 1. Understand partitions 2. Never apply a DEFAULT to a new column 3. Help us manage UDFs better 4.INDEX CONCURRENTLY 5. Prettier syntax for multi-commit sequences
  64. 64. Epilogue
  65. 65. No tool is perfect. DBAs should drive migration tool choice. Chose a tool that your developers like. Or, don't hate.
  66. 66. Other tools: Sqitch http://sqitch.org/ Written by PostgreSQL contributor Erwin http://erwin.com/ Commercial, popular with Oracle South http://south.aeracode.org/ Django-specific, well-supported
  67. 67. Alembic resources: bitbucket.org/zzzeek/alembic alembic.readthedocs.org groups.google.com/group/ sqlalchemy-alembic
  68. 68. Sane Schema Management with Alembic and SQLAlchemy Selena Deckelmann Mozilla @selenamarie chesnok.com

×