Database Refactoring Sreeni Ananthakrishna 2006 Nov

818 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
818
On SlideShare
0
From Embeds
0
Number of Embeds
21
Actions
Shares
0
Downloads
19
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Database Refactoring Sreeni Ananthakrishna 2006 Nov

  1. 1. Database Refactoring An introduction to Refactoring Databases & Evolutionary Database Design (Amber and Sadalage)
  2. 2. Agenda <ul><li>What is database refactoring about? </li></ul><ul><li>Evolutionary database development techniques </li></ul><ul><li>Refactoring Strategies </li></ul><ul><li>Classification of refactorings and examples </li></ul>
  3. 3. What is database refactoring about? <ul><ul><li>Improving database design </li></ul></ul><ul><ul><li>Making small and incremental changes to the schema </li></ul></ul><ul><ul><li>Maintain existing information and behaviour </li></ul></ul><ul><ul><li>Functionality is not added/removed </li></ul></ul><ul><ul><li>Not just limited to the database, but also the applications that use it </li></ul></ul>
  4. 4. A simple example… Customer accesses balance Customer SynchronizeAccountBalance {event = on update |on delete|on insert, drop date = <date> } balance SynchronizeCustomerBalance {event = on update |on delete|on insert, drop date = <date> } {drop date = <date>} App A App B maintainbalance() maintainbalance() customerId <<PK>> name Account accountId <<PK>> customerId <<FK>>
  5. 5. Why refactor ? <ul><li>Data models built upfront tend to be complex and need cleaning </li></ul><ul><li>Maintain consistency between application domain and data model </li></ul><ul><li>Address performance requirements </li></ul><ul><li>Identify and eliminate db smells </li></ul>
  6. 6. Database Smells <ul><li>Multipurpose Column - eg. Customer dob & employee start date </li></ul><ul><li>Multipurpose Table – eg. Customer table with person/corps </li></ul><ul><li>Redundant Data – same information in different tables </li></ul><ul><li>Table with too many columns – eg. Customer with many address </li></ul><ul><li>Table with too many rows </li></ul><ul><li>Smart columns – eg. Data has positional context </li></ul><ul><li>Fear of change – too risky to change schema, time to refactor! </li></ul>
  7. 7. Evolutionary Database Development <ul><li>Evolve data models vs upfront design </li></ul><ul><li>Database regression testing </li></ul><ul><li>Configuration management of database artifacts </li></ul><ul><li>Developer Sandboxes </li></ul>
  8. 8. Database regression testing <ul><li>Test the schema </li></ul><ul><ul><li>Check logic in stored procedures and triggers </li></ul></ul><ul><ul><li>Test check and referential constraints </li></ul></ul><ul><ul><li>View definitions </li></ul></ul><ul><ul><li>Default Values and Invariants </li></ul></ul><ul><li>Test application code </li></ul><ul><ul><li>Unit tests around application code which queries the db. </li></ul></ul><ul><li>Test data migration </li></ul>
  9. 9. Config management of DB Artifacts <ul><li>Schema creation scripts </li></ul><ul><li>Data loading/migration scripts </li></ul><ul><li>Reference data </li></ul><ul><li>Stored procedures </li></ul><ul><li>View definitions </li></ul><ul><li>Test data </li></ul><ul><li>Regression Tests </li></ul>
  10. 10. Developer Sandboxes
  11. 11. Database Refactoring Strategies <ul><li>Apply small changes </li></ul><ul><ul><li>Small changes allow easy/early detection of errors </li></ul></ul><ul><li>Identify Individual Refactorings </li></ul><ul><ul><li>Instead of doing “move column” and “rename column” in one go, version each individually. </li></ul></ul><ul><li>Create database configuration table </li></ul><ul><ul><li>Helps identify current version of the database and can be used in migrations. </li></ul></ul>
  12. 12. Database Refactoring Strategies (contd.) <ul><li>Determine synchronization strategies during transition period </li></ul><ul><ul><li>Triggers do real time update but might have performance impacts. </li></ul></ul><ul><ul><li>Views might not supports updates but do not move data </li></ul></ul><ul><ul><li>Batch synch can be used during non-peak loads but might have to deal with multiple updates </li></ul></ul><ul><li>Encapsulate Database Access </li></ul><ul><ul><li>Abstract database access eg. By using persistence frameworks </li></ul></ul>
  13. 13. Database Refactoring Classification <ul><li>Structural </li></ul><ul><li>Data Quality </li></ul><ul><li>Referential </li></ul><ul><li>Architectural </li></ul><ul><li>Method </li></ul>
  14. 14. Structural Refactorings <ul><li>Related to structure of Tables, Views </li></ul><ul><li>eg. Move Column, Rename Table, Split Table, Merge Column </li></ul><ul><li>Issues to consider when implementing: </li></ul><ul><ul><li>Cyclic Triggers </li></ul></ul><ul><ul><li>Broken Views, Procedures, Triggers </li></ul></ul><ul><ul><li>Transition period in multi-application setup </li></ul></ul>
  15. 15. Introduce Surrogate Key <ul><li>Motivations </li></ul><ul><ul><li>Reduce coupling between schema and business domain </li></ul></ul><ul><ul><li>Increase consistency by having a uniform key strategy </li></ul></ul><ul><ul><li>Improve performance by having index based on simpler key </li></ul></ul><ul><li>Potential Tradeoffs </li></ul><ul><ul><li>Surrogate keys are not suitable for all situations </li></ul></ul><ul><ul><li>Introducing a new key might require further key consolidation and more effort </li></ul></ul>“ Replace an existing natural key with a surrogate key”
  16. 16. Introduce Surrogate Key (contd.) contains balance PopulateOrderId {event = on insert drop date = <date> } orderId <<FK>> <<surrogate>> orderId <<PK>> <<surrogate>> {drop date = <date>} Order customerNumber <<PK>> <<FK>> <<Natural>> storeId <<PK>> <<Natural>> OrderItem customerNumber <<PK>> <<FK>> <<Natural>> storeId <<PK>> <<Natural>> orderItemNumber <<PK>>
  17. 17. Data Quality Refactorings <ul><li>Related to improving quality of information in db </li></ul><ul><li>eg. Add Lookup Table, Introduce column constraint, Introduce common format </li></ul><ul><li>Issues to consider when implementing: </li></ul><ul><ul><li>Constraint violations </li></ul></ul><ul><ul><li>Broken logic in procedures </li></ul></ul><ul><ul><li>Broken where clauses in Views </li></ul></ul><ul><ul><li>Updating large amounts of data </li></ul></ul>
  18. 18. Add Lookup Table <ul><li>Motivations </li></ul><ul><ul><li>Introduce referential integrity for a column </li></ul></ul><ul><ul><li>Provide code lookup (move enum to the db) </li></ul></ul><ul><ul><li>Replace column constraint with set of expected values in lookup table </li></ul></ul><ul><li>Potential Tradeoffs </li></ul><ul><ul><li>Identifying the data to populate (especially for multiple apps) </li></ul></ul><ul><ul><li>Possible performance impact due to additional joins </li></ul></ul>“ Create a lookup table for an existing column”
  19. 19. Add Lookup Table (contd.) Address Street <<FK>> 1. Identify the column 4. Introduce FK constraint 3. Populate Data 2. Create Lookup Table State PostCode State State <<PK>> Name
  20. 20. Referential Integrity Refactorings <ul><li>Changes that improve referential integrity of data </li></ul><ul><li>eg. Add Foreign Key Constraint, Introduce cascading delete, Introduce trigger for history </li></ul><ul><li>Issues to consider when implementing: </li></ul><ul><ul><li>Fix broken CRUD logic in procedure </li></ul></ul><ul><ul><li>Data cleansing to make new constraints work </li></ul></ul>
  21. 21. Introduce Cascading Delete <ul><li>Motivations </li></ul><ul><ul><li>Preserve referential integrity of the parent /child rows </li></ul></ul><ul><ul><li>Remove responsibility for child deletion in the application </li></ul></ul><ul><li>Potential Tradeoffs </li></ul><ul><ul><li>Deadlock ? </li></ul></ul><ul><ul><li>Trigger accidental mass deletion when deleting root nodes </li></ul></ul><ul><ul><li>Duplicate functionality is introduced when using persistence frameworks like Hibernate/Toplink </li></ul></ul>“ Delete the child record(s) when the parent is deleted”
  22. 22. Introduce Cascading Delete (contd.) Policy PolicyId <<PK>> Claim ClaimId <<PK>> 1. Identify the column 2. Choose cascading mechanism (triggers or using cascade clause during constraint creation) PolicyId <<FK>> DeleteClaim {event = on delete}
  23. 23. Architectural Refactorings <ul><li>Changes that improve performance, portability and define the architecture within the database </li></ul><ul><li>eg. Encapsulate Table with View, Introduce Calculation Method, Replace Method(s) with View, Introduce trigger for history </li></ul><ul><li>Issues to consider when implementing: </li></ul><ul><ul><li>Performance vs Data redundancy </li></ul></ul><ul><ul><li>Keeping business logic in the application vs database </li></ul></ul>
  24. 24. Introduce Index <ul><li>Motivations </li></ul><ul><ul><li>Increase performance of read queries </li></ul></ul><ul><li>Potential Tradeoffs </li></ul><ul><ul><li>Too many indexes degrade performance during insert/update/deletes </li></ul></ul><ul><ul><li>Existing data containing duplicates might need cleansing when introducing unique indexes </li></ul></ul>“ Introduce a unique or non-unique Index”
  25. 25. Introduce Index (contd.) Customer CustomerId <<PK>> TFN <<index>> 1. Determine type of index – unique vs non-unique 3. Add a new index TFN <<AK>> Name 4. Add more disk space for index maintenance 2. Eliminate duplicate rows when using unique index
  26. 26. Method Refactorings <ul><li>Changes that improve code representing stored procedures, functions and triggers </li></ul><ul><li>eg. Rename Method, Reorder Parameters, Replace literal with Table Lookup </li></ul><ul><li>Issues to consider when implementing: </li></ul><ul><ul><li>Broken triggers, procedures, functions </li></ul></ul><ul><ul><li>Tool support </li></ul></ul>
  27. 27. Refactoring Tools <ul><li>Schema Migration – Rails Migration, Sundog </li></ul><ul><li>Unit Testing –JUnit, DBUnit </li></ul><ul><li>Refactor Stored Procedures – SQLRefactor(SQLServer Only) </li></ul>

×