Database Refactoring Sreeni Ananthakrishna 2006 Nov - Presentation Transcript
Database Refactoring An introduction to Refactoring Databases & Evolutionary Database Design (Amber and Sadalage)
Agenda
What is database refactoring about?
Evolutionary database development techniques
Refactoring Strategies
Classification of refactorings and examples
What is database refactoring about?
Improving database design
Making small and incremental changes to the schema
Maintain existing information and behaviour
Functionality is not added/removed
Not just limited to the database, but also the applications that use it
A simple example… Customer accesses balance Customer SynchronizeAccountBalance {event = on update |on delete|on insert, drop date = <date> } balance SynchronizeCustomerBalance {event = on update |on delete|on insert, drop date = <date> } {drop date = <date>} App A App B maintainbalance() maintainbalance() customerId <<PK>> name Account accountId <<PK>> customerId <<FK>>
Why refactor ?
Data models built upfront tend to be complex and need cleaning
Maintain consistency between application domain and data model
Address performance requirements
Identify and eliminate db smells
Database Smells
Multipurpose Column - eg. Customer dob & employee start date
Multipurpose Table – eg. Customer table with person/corps
Redundant Data – same information in different tables
Table with too many columns – eg. Customer with many address
Table with too many rows
Smart columns – eg. Data has positional context
Fear of change – too risky to change schema, time to refactor!
Evolutionary Database Development
Evolve data models vs upfront design
Database regression testing
Configuration management of database artifacts
Developer Sandboxes
Database regression testing
Test the schema
Check logic in stored procedures and triggers
Test check and referential constraints
View definitions
Default Values and Invariants
Test application code
Unit tests around application code which queries the db.
Test data migration
Config management of DB Artifacts
Schema creation scripts
Data loading/migration scripts
Reference data
Stored procedures
View definitions
Test data
Regression Tests
Developer Sandboxes
Database Refactoring Strategies
Apply small changes
Small changes allow easy/early detection of errors
Identify Individual Refactorings
Instead of doing “move column” and “rename column” in one go, version each individually.
Create database configuration table
Helps identify current version of the database and can be used in migrations.
Database Refactoring Strategies (contd.)
Determine synchronization strategies during transition period
Triggers do real time update but might have performance impacts.
Views might not supports updates but do not move data
Batch synch can be used during non-peak loads but might have to deal with multiple updates
Encapsulate Database Access
Abstract database access eg. By using persistence frameworks
Reduce coupling between schema and business domain
Increase consistency by having a uniform key strategy
Improve performance by having index based on simpler key
Potential Tradeoffs
Surrogate keys are not suitable for all situations
Introducing a new key might require further key consolidation and more effort
“ Replace an existing natural key with a surrogate key”
Introduce Surrogate Key (contd.) contains balance PopulateOrderId {event = on insert drop date = <date> } orderId <<FK>> <<surrogate>> orderId <<PK>> <<surrogate>> {drop date = <date>} Order customerNumber <<PK>> <<FK>> <<Natural>> storeId <<PK>> <<Natural>> OrderItem customerNumber <<PK>> <<FK>> <<Natural>> storeId <<PK>> <<Natural>> orderItemNumber <<PK>>
Data Quality Refactorings
Related to improving quality of information in db
eg. Add Lookup Table, Introduce column constraint, Introduce common format
Issues to consider when implementing:
Constraint violations
Broken logic in procedures
Broken where clauses in Views
Updating large amounts of data
Add Lookup Table
Motivations
Introduce referential integrity for a column
Provide code lookup (move enum to the db)
Replace column constraint with set of expected values in lookup table
Potential Tradeoffs
Identifying the data to populate (especially for multiple apps)
Possible performance impact due to additional joins
“ Create a lookup table for an existing column”
Add Lookup Table (contd.) Address Street <<FK>> 1. Identify the column 4. Introduce FK constraint 3. Populate Data 2. Create Lookup Table State PostCode State State <<PK>> Name
Referential Integrity Refactorings
Changes that improve referential integrity of data
eg. Add Foreign Key Constraint, Introduce cascading delete, Introduce trigger for history
Issues to consider when implementing:
Fix broken CRUD logic in procedure
Data cleansing to make new constraints work
Introduce Cascading Delete
Motivations
Preserve referential integrity of the parent /child rows
Remove responsibility for child deletion in the application
Potential Tradeoffs
Deadlock ?
Trigger accidental mass deletion when deleting root nodes
Duplicate functionality is introduced when using persistence frameworks like Hibernate/Toplink
“ Delete the child record(s) when the parent is deleted”
Introduce Cascading Delete (contd.) Policy PolicyId <<PK>> Claim ClaimId <<PK>> 1. Identify the column 2. Choose cascading mechanism (triggers or using cascade clause during constraint creation) PolicyId <<FK>> DeleteClaim {event = on delete}
Architectural Refactorings
Changes that improve performance, portability and define the architecture within the database
eg. Encapsulate Table with View, Introduce Calculation Method, Replace Method(s) with View, Introduce trigger for history
Issues to consider when implementing:
Performance vs Data redundancy
Keeping business logic in the application vs database
Introduce Index
Motivations
Increase performance of read queries
Potential Tradeoffs
Too many indexes degrade performance during insert/update/deletes
Existing data containing duplicates might need cleansing when introducing unique indexes
“ Introduce a unique or non-unique Index”
Introduce Index (contd.) Customer CustomerId <<PK>> TFN <<index>> 1. Determine type of index – unique vs non-unique 3. Add a new index TFN <<AK>> Name 4. Add more disk space for index maintenance 2. Eliminate duplicate rows when using unique index
Method Refactorings
Changes that improve code representing stored procedures, functions and triggers
eg. Rename Method, Reorder Parameters, Replace literal with Table Lookup
0 comments
Post a comment