Database Refactoring Sreeni Ananthakrishna 2006 Nov
Upcoming SlideShare
Loading in...5

Database Refactoring Sreeni Ananthakrishna 2006 Nov






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Database Refactoring Sreeni Ananthakrishna 2006 Nov Database Refactoring Sreeni Ananthakrishna 2006 Nov Presentation Transcript

  • Database Refactoring An introduction to Refactoring Databases & Evolutionary Database Design (Amber and Sadalage)
  • Agenda
    • What is database refactoring about?
    • Evolutionary database development techniques
    • Refactoring Strategies
    • Classification of refactorings and examples
  • What is database refactoring about?
      • Improving database design
      • Making small and incremental changes to the schema
      • Maintain existing information and behaviour
      • Functionality is not added/removed
      • Not just limited to the database, but also the applications that use it
  • A simple example… Customer accesses balance Customer SynchronizeAccountBalance {event = on update |on delete|on insert, drop date = <date> } balance SynchronizeCustomerBalance {event = on update |on delete|on insert, drop date = <date> } {drop date = <date>} App A App B maintainbalance() maintainbalance() customerId <<PK>> name Account accountId <<PK>> customerId <<FK>>
  • Why refactor ?
    • Data models built upfront tend to be complex and need cleaning
    • Maintain consistency between application domain and data model
    • Address performance requirements
    • Identify and eliminate db smells
  • Database Smells
    • Multipurpose Column - eg. Customer dob & employee start date
    • Multipurpose Table – eg. Customer table with person/corps
    • Redundant Data – same information in different tables
    • Table with too many columns – eg. Customer with many address
    • Table with too many rows
    • Smart columns – eg. Data has positional context
    • Fear of change – too risky to change schema, time to refactor!
  • Evolutionary Database Development
    • Evolve data models vs upfront design
    • Database regression testing
    • Configuration management of database artifacts
    • Developer Sandboxes
  • Database regression testing
    • Test the schema
      • Check logic in stored procedures and triggers
      • Test check and referential constraints
      • View definitions
      • Default Values and Invariants
    • Test application code
      • Unit tests around application code which queries the db.
    • Test data migration
  • Config management of DB Artifacts
    • Schema creation scripts
    • Data loading/migration scripts
    • Reference data
    • Stored procedures
    • View definitions
    • Test data
    • Regression Tests
  • Developer Sandboxes
  • Database Refactoring Strategies
    • Apply small changes
      • Small changes allow easy/early detection of errors
    • Identify Individual Refactorings
      • Instead of doing “move column” and “rename column” in one go, version each individually.
    • Create database configuration table
      • Helps identify current version of the database and can be used in migrations.
  • Database Refactoring Strategies (contd.)
    • Determine synchronization strategies during transition period
      • Triggers do real time update but might have performance impacts.
      • Views might not supports updates but do not move data
      • Batch synch can be used during non-peak loads but might have to deal with multiple updates
    • Encapsulate Database Access
      • Abstract database access eg. By using persistence frameworks
  • Database Refactoring Classification
    • Structural
    • Data Quality
    • Referential
    • Architectural
    • Method
  • Structural Refactorings
    • Related to structure of Tables, Views
    • eg. Move Column, Rename Table, Split Table, Merge Column
    • Issues to consider when implementing:
      • Cyclic Triggers
      • Broken Views, Procedures, Triggers
      • Transition period in multi-application setup
  • Introduce Surrogate Key
    • Motivations
      • Reduce coupling between schema and business domain
      • Increase consistency by having a uniform key strategy
      • Improve performance by having index based on simpler key
    • Potential Tradeoffs
      • Surrogate keys are not suitable for all situations
      • Introducing a new key might require further key consolidation and more effort
    “ Replace an existing natural key with a surrogate key”
  • Introduce Surrogate Key (contd.) contains balance PopulateOrderId {event = on insert drop date = <date> } orderId <<FK>> <<surrogate>> orderId <<PK>> <<surrogate>> {drop date = <date>} Order customerNumber <<PK>> <<FK>> <<Natural>> storeId <<PK>> <<Natural>> OrderItem customerNumber <<PK>> <<FK>> <<Natural>> storeId <<PK>> <<Natural>> orderItemNumber <<PK>>
  • Data Quality Refactorings
    • Related to improving quality of information in db
    • eg. Add Lookup Table, Introduce column constraint, Introduce common format
    • Issues to consider when implementing:
      • Constraint violations
      • Broken logic in procedures
      • Broken where clauses in Views
      • Updating large amounts of data
  • Add Lookup Table
    • Motivations
      • Introduce referential integrity for a column
      • Provide code lookup (move enum to the db)
      • Replace column constraint with set of expected values in lookup table
    • Potential Tradeoffs
      • Identifying the data to populate (especially for multiple apps)
      • Possible performance impact due to additional joins
    “ Create a lookup table for an existing column”
  • Add Lookup Table (contd.) Address Street <<FK>> 1. Identify the column 4. Introduce FK constraint 3. Populate Data 2. Create Lookup Table State PostCode State State <<PK>> Name
  • Referential Integrity Refactorings
    • Changes that improve referential integrity of data
    • eg. Add Foreign Key Constraint, Introduce cascading delete, Introduce trigger for history
    • Issues to consider when implementing:
      • Fix broken CRUD logic in procedure
      • Data cleansing to make new constraints work
  • Introduce Cascading Delete
    • Motivations
      • Preserve referential integrity of the parent /child rows
      • Remove responsibility for child deletion in the application
    • Potential Tradeoffs
      • Deadlock ?
      • Trigger accidental mass deletion when deleting root nodes
      • Duplicate functionality is introduced when using persistence frameworks like Hibernate/Toplink
    “ Delete the child record(s) when the parent is deleted”
  • Introduce Cascading Delete (contd.) Policy PolicyId <<PK>> Claim ClaimId <<PK>> 1. Identify the column 2. Choose cascading mechanism (triggers or using cascade clause during constraint creation) PolicyId <<FK>> DeleteClaim {event = on delete}
  • Architectural Refactorings
    • Changes that improve performance, portability and define the architecture within the database
    • eg. Encapsulate Table with View, Introduce Calculation Method, Replace Method(s) with View, Introduce trigger for history
    • Issues to consider when implementing:
      • Performance vs Data redundancy
      • Keeping business logic in the application vs database
  • Introduce Index
    • Motivations
      • Increase performance of read queries
    • Potential Tradeoffs
      • Too many indexes degrade performance during insert/update/deletes
      • Existing data containing duplicates might need cleansing when introducing unique indexes
    “ Introduce a unique or non-unique Index”
  • Introduce Index (contd.) Customer CustomerId <<PK>> TFN <<index>> 1. Determine type of index – unique vs non-unique 3. Add a new index TFN <<AK>> Name 4. Add more disk space for index maintenance 2. Eliminate duplicate rows when using unique index
  • Method Refactorings
    • Changes that improve code representing stored procedures, functions and triggers
    • eg. Rename Method, Reorder Parameters, Replace literal with Table Lookup
    • Issues to consider when implementing:
      • Broken triggers, procedures, functions
      • Tool support
  • Refactoring Tools
    • Schema Migration – Rails Migration, Sundog
    • Unit Testing –JUnit, DBUnit
    • Refactor Stored Procedures – SQLRefactor(SQLServer Only)