Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Schema Evolution Patterns - Velocity SJ 2019

82 views

Published on

Everybody’s talking about microservices, but nobody seems to agree on how to make them talk to each other. How should you version your APIs, and how does API version deprecation actually work in practice? Do you use plain old JavaScript object notation (JSON), Thrift, protocol buffers, GraphQL? How do teams communicate changes in their services’ interfaces, and how do consumer services respond? Separately, nobody seems to agree on how to handle migrating a service’s structured data without downtime. Do you write to shadow tables? Chain new tables off the old ones? Just run the migration live and hope nothing bad happens? Switch everything over to NoSQL?

Both these problems are instances of issues with schema evolution: what happens when the structure of your structured data changes. Rather than taking a prescriptive approach, Alex Rasmussen distills a lot of institutional knowledge and computer science history into a set of patterns and examines the trade-offs between them.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Schema Evolution Patterns - Velocity SJ 2019

  1. 1. Schema Evolution 
 Patterns Alex Rasmussen alex@bitsondisk.com John Gould (14.Sep.1804 - 3.Feb.1881) [Public domain]
  2. 2. Hi, I’m Alex! Twitter, GitHub, etc: @alexras
 alex@bitsondisk.com https://www.bitsondisk.com/ LA-based 
 Data Engineering 
 Consultant
  3. 3. Data Migrations Are Hard •How should I migrate? When? •Will the old code still work? •Can I migrate without downtime? •Can I roll back? How? What if I can’t?
  4. 4. Should I just use MongoDB? 
 How do I decide?
  5. 5. API Versioning Is Hard •When should I bump versions? •Will old callers break? •Should I just switch to GraphQL?
  6. 6. THESE HARD PROBLEMS 
 ARE THE SAME PROBLEM!
  7. 7. Schema Evolution •The structure of your data will change •What does that mean for the data? •How should our systems respond? •How does this impact how we build?
  8. 8. Goals of This Talk •Mental toolkit for reasoning about 
 schema evolution that applies
 across technologies and domains •How this looks in practice https://github.com/bitsondisk/velocity-sj-2019References:
  9. 9. 1. WHAT ARE SCHEMAS? 2. SCHEMA COMPATIBILITY 3. CROSSING COMPATIBILITY GAPS 4. WRAPPING UP / TAKEAWAYS
  10. 10. 1. WHAT ARE SCHEMAS? 2. SCHEMA COMPATIBILITY 3. CROSSING COMPATIBILITY GAPS 4. WRAPPING UP / TAKEAWAYS
  11. 11. Schemas •From the Greek skhēma, meaning 
 “form” or “figure”. The data’s shape. Person { name: string, age: integer, favorite_color: enum(RED, BLUE, …), . . . }
  12. 12. What/Where Is the Schema? •Explicit: SQL tables, Thrift, Avro* •Implicit: CSV, Excel, JSON •Separate: JSON Schema, schema registries, an ex-employee’s brain
  13. 13. When Are Schemas Enforced? •On Write: e.g. in RDBMS tables •On Read: enforce after reading •On Need: dynamic, late-binding •Can enforce one schema, or many
  14. 14. 1. WHAT ARE SCHEMAS? 2. SCHEMA COMPATIBILITY 3. CROSSING COMPATIBILITY GAPS 4. WRAPPING UP / TAKEAWAYS
  15. 15. Schema Compatibility •If two schemas are compatible, evolving from one schema to another can be done
 automatically on read* •Readers can be oblivious to schema change** •Two directions: backwards and forwards
  16. 16. Backwards-Compatible X X+1 Data written with old schema 
 readable by clients with new schema C
  17. 17. Forwards-Compatible X X+1 Data written with new schema 
 readable by clients with old schema C
  18. 18. Add a Field With a Default name: string, age: integer, is_admin: boolean (default: false) name: “Bob Jones”, age: 42 name: “Bob Jones”, age: 42, is_admin: false Backwards: substitute default in older data Forwards: ignore field in newer data
  19. 19. Remove a Field With a Default name: string, age: integer, is_admin: boolean (default: false) pto_days_left: int (default: 0) name: “Alice Smith”, age: 29, is_admin: true pto_days_left: 16 name: “Alice Smith”, age: 29 is_admin: true Backwards: ignore field in older data Forwards: substitute default in newer data
  20. 20. Other Types of Changes •Without defaults: - Adding a field breaks backwards-compatibility - Removing a field breaks forwards-compatibility •Field type changes: typecasting rules apply(?) •Renaming and reordering: it depends
  21. 21. In Practice - Protocol Buffers message Person { required string name = 1; required int32 age = 2; optional bool is_admin = 3 
 [default = false]; } •Field numbers can’t change or be reused •In proto3, no required or optional
  22. 22. In Practice - Avro record Person { string name; int age; boolean is_admin = false; } •Reorders are OK, but not renames •Compatibility determined by rules
  23. 23. In Practice - GraphQL •Every field is nullable by default •Only return queried fields (bounded enforcement) •Hide deprecated fields; deletes tricky type Person { name: String!, age: Int!, is_admin: Boolean }
  24. 24. In Practice - Wix •Adding fields - Non-indexed fields: JSON blob column 
 (schema-on-read!) - Indexed fields: new table + join •Removing fields - Don’t. Just stop using them (similar to GraphQL)
  25. 25. Recap •Compatibility makes evolution easier •Changes can be 
 backwards-compatible, 
 forwards-compatible, both, or neither •Ease-of-compatibility drives many decisions behind “rules” of format design
  26. 26. 1. WHAT ARE SCHEMAS? 2. SCHEMA COMPATIBILITY 3. CROSSING COMPATIBILITY GAPS 4. WRAPPING UP / TAKEAWAYS
  27. 27. Crossing Compatibility Gaps •Not all schema changes are compatible •Not all incompatibilities are simple •Not all compatible changes are practical •How do we deal with that?
  28. 28. Complex Changes name: string, first_name: string, last_name: string, age: integer, is_admin: boolean (default: false) •Not obvious how to split •Client code changes (likely) required
  29. 29. Impractical Changes •e.g. Adding a column in MySQL requires locking/copying the table - Days to weeks not unheard of
  30. 30. Crossing Compatibility Gaps •We’ll look at: - Multi-schema stores (Kafka, et al) - Single-schema stores (RDBMS, et al)
  31. 31. Multi-Schema Stores C1 C2 C3 C4 1 2 3 4 5
  32. 32. Step 1: Old clients write with new schema 1 2 3 4 5 C1 C2 C3 C4 C1 C2
  33. 33. Step 2: Migrate old data to new schema 1 2 3 4 5 C1 C2 C3 C4 C1 C2
  34. 34. Step 3: Migrate old clients to new schema 1 2 3 4 5 C1 C2 C3 C4
  35. 35. In Practice: Kafka (Confluent) •Stores Avro schemas in schema registry with compatibility check built-in •Backwards-compatible changes: 
 update readers first •Forwards-compatible changes: 
 update writers first
  36. 36. In Practice: Stripe •Goal: (backwards-incompatible) API responses that work for old clients, forever •Version change modules applied backwards from present to client’s version •They admit: this is hard, can’t last forever
  37. 37. Recap •In multi-schema stores: - Migrate client writes/reads - Migrate data - Migrate client reads/writes
  38. 38. Single Schema Stores X-1 X C1 C2 C3 C4S
  39. 39. Single-Schema Migration X X+1 C1 C2 C3 C4 S Move from X 
 to (incompatible) X + 1 
 without downtime
  40. 40. Step 1: Create and migrate temporary store S’ X X+1 C1 C2 C3 C4 S S’
  41. 41. Step 2: Create a copier and an updater X X+1 C1 C2 C3 C4 S S’ U C
  42. 42. Step 3.1: Move clients over to new schema X X+1 C1 C2 C3 C4 S S’ U C
  43. 43. Step 3.2: Copy data, record / apply updates X X+1 C1 C2 C3 C4 S S’ U C
  44. 44. Step 4: Cutover - S’ becomes S X X+1 C1 C2 C3 C4 SSold U
  45. 45. Step 5: Drain updater, delete Sold X X+1 C1 C2 C3 C4 S
  46. 46. In Practice - Percona • pt-online-schema-change - Copier: scan/copy in timed chunks - Updater: synchronous table triggers - Cutover: RENAME TABLE
  47. 47. In Practice - Facebook •OSC (Online Schema Change) - Copier: dump/load snapshot chunks - Updater: triggers + changelog - Cutover: lock, replay, rename, unlock
  48. 48. In Practice - GitHub •gh-ost - Copier: chunked reads/writes - Updater: read binlog, interleave copies - Cutover: 2-step blocking swap
  49. 49. In Practice - Wix •Updater client-side, controlled with feature toggles - Phase 1: write to S and S’ - Phase 2: read from S’, fallback to S - Phase 3: write only to new •Copy phase eagerly after phase 3
  50. 50. Recap •In single-schema stores: - Migrate clients, retaining old schema - Create data w/ new schema, over time - When in sync, cut over
  51. 51. 1. WHAT ARE SCHEMAS? 2. SCHEMA COMPATIBILITY 3. CROSSING COMPATIBILITY GAPS 4. WRAPPING UP / TAKEAWAYS
  52. 52. Wrapping Up •Concepts covered here cut across RDBMSs, APIs, document stores, 
 data lakes, etc. •Reason about schema evolution up-front to guide your architecture choices •Your schemas will change. Be prepared.
  53. 53. A Few Takeaways •Make your changes compatible if you can •Compatibility informs migration strategy •Schema-on-read/need won’t save you, 
 but it helps sometimes •One size does not fit all
  54. 54. Thank You! Questions? https://www.bitsondisk.com/ Twitter, GitHub,etc: @alexras alex@bitsondisk.com https://github.com/bitsondisk/velocity-sj-2019 Session Page O’Reilly Events App Rate This TalkContact Me References:

×