Refactoring database

Refactoring Database Perficient China Lancelot Zhu [email_address]

Agenda Evolutionary Database Development The Process of Database Refactoring Database Refactoring Strategies Database Refactoring Patterns

Evolutionary Database Development

Evolutionary Data Modeling The Agile Model-Driven Development (AMDD) life cycle

Database Regression Testing 1. Quickly add a test, basically just enough code so that your tests now fail. 2. Run your tests - often the complete test suite, although for the sake of speed you may decide to run only a subset - to ensure that the new test does in fact fail. 3. Update your functional code so that it passes the new test. 4. Run your tests again. If the tests fail, return to Step 3; otherwise, start over again.

Configuration Management of Database Artifacts Data definition language (DDL) scripts to create the database schema Data load/extract/migration scripts Data model files Object/relational mapping meta data Reference data Stored procedure and trigger definitions View definitions Referential integrity constraints Other database objects like sequences, indexes, and so on Test data Test data generation scripts Test scripts

Developer Sandboxes A "sandbox" is a fully functioning environment in which a system may be built, tested, and/or run.

The Process of Database Refactoring

The two categories of database architecture Single-Application Database Environments Multi-Application Database Environments

Database Smells Multipurpose column Multipurpose table Redundant data Tables with too many columns Tables with too many rows "Smart" columns Fear of change

How Database Refactoring Fits In Potential development activities on an evolutionary development project

Why DB Refactoring is Hard Databases are highly coupled to external programs.

The database refactoring process Verify that a database refactoring is appropriate. Choose the most appropriate database refactoring. Deprecate the original database schema. Test before, during, and after. Modify the database schema. Migrate the source data. Modify external access program(s). Run regression tests. Version control your work. Announce the refactoring.

Database Refactoring Strategies

Database Refactoring Strategies Smaller changes are easier to apply. Uniquely identify individual refactorings. Implement a large change by many small ones. Have a database configuration table. Prefer triggers over views or batch synchronization. Choose a sufficient deprecation period. Simplify your database change control board (CCB) strategy. Simplify negotiations with other teams. Encapsulate database access. Be able to easily set up a database environment. Do not duplicate SQL. Put database assets under change control. Beware of politics.

Version your database Uniquely identify individual refactorings. Have a database configuration table.

Database Refactoring Categories Category Description Examples Structural A change to the definition of one or more tables or views. Rename Column Drop Table Introduce Surrogate Key Data Quality A change that improves the quality of the information contained within a database. Add Lookup Table Consolidate Key Strategy Make Column Non-Nullable Referential Integrity A change that ensures that a referenced row exists within another table and/or that ensures that a row that is no longer needed is removed appropriately. Add Foreign Key Constraint Introduce Soft Delete Introduce Trigger For History

Database Refactoring Categories (Continued) Category Description Example Architectural A change that improves the overall manner in which external programs interact with a database. Introduce Read-Only Table Encapsulate Table With View Introduce Index Method A change to a method (a stored procedure, stored function, or trigger) that improves its quality. Many code refactorings are applicable to database methods. Add Parameter Rename Method Extract Method Non-Refactoring Transformation A change to your database schema that changes its semantics. Insert Data Introduce New Column Introduce New Table

Drop Column (1) Remove a column from an existing table

Drop Column (2) Motivation Refactor a database table design Refactor external applications, e.g. no longer used Potential Tradeoffs The column being dropped may contain valuable data Tables containing many rows Schema Update Mechanics Choose a remove strategy Drop the column Rework foreign keys Phase I: COMMENT ON COLUMN person.gender IS ‘Drop date = May 11 2010’; Phase II: ALTER TABLE person DROP COLUMN gender;

Drop Column (3) Data-Migration Mechanics Preserve data Phase II (before drop column): CREATE TABLE person_gender AS SELECT id, gender FROM person; Access Program Update Mechanics Refactor code to use alternate data sources Slim down SELECT statement Refactor database inserts and updates

Drop Table (1) Remove an existing table from the database

Drop Table (2) Motivation a table is no longer required and/or used the table has been replaced by another similar data source Potential Tradeoffs may need to preserve some or all of the data Schema Update Mechanics resolve data-integrity issues Phase I: COMMENT ON TABLE person IS ‘Drop date = May 11 2010’; Phase II: DROP TABLE person; Data-Migration Mechanics Phase II (before drop table): CREATE TABLE person_backup AS SELECT * FROM person; Access Program Update Mechanics Any external programs referencing this table must be refactored to access the alternative data source(s).

Rename Column (1) Rename an existing table column

Rename Column (2) Motivation increase the readability of your database schema enable database porting, e.g. reserved keyword conflict Potential Tradeoffs the cost of refactoring the external applications Schema Update Mechanics Introduce the new column Introduce a synchronization trigger Rename other columns Phase I: ALTER TABLE person ADD sex VARCHAR2(10); COMMENT ON COLUMN person.gender ‘Renamed to sex, drop date = June 6 2010’; UPDATE person SET sex = gender;

Rename Column (3) CREATE OR REPLACE TRIGGER SynchronizeSex BEFORE INSERT OR UPDATE ON person REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN IF INSERTING THEN IF :NEW.sex IS NULL THEN :NEW.sex := :NEW.gender; END IF; IF :NEW.gender IS NULL THEN :NEW.gender := :NEW.sex; END IF; END IF; IF UPDATING THEN IF NOT(:NEW.sex=:OLD.sex) THEN :NEW.gender:=:NEW.sex; END IF; IF NOT(:NEW.gender=:OLD.gender) THEN :NEW.sex:=:NEW.gender; END IF; END IF; END; /

Rename Column (4) Phase II: DROP TRIGGER SynchronizeSex; ALTER TABLE person DROP COLUMN gender; Data-Migration Mechanics copy all the data from the original column into the new column Access Program Update Mechanics External programs that reference this column must be updated to reference columns by its new name Update any embedded SQL and/or mapping meta data, in this case, we have to update JPA entity

Rename Table (1) Rename an existing table

Rename Table (2) Motivation Clarify the table's meaning and intent Conform to accepted database naming conventions Potential Tradeoffs The cost to refactoring the external applications that access the table versus the improved readability and/or consistency provided by the new name Schema Update Mechanics Phase I: CREATE TABLE people( id NUMBER NOT NULL, firstname VARCHAR2(30), lastname VARCHAR2(20), gender VARCHAR2(10), lastchange DATE CONSTRAINT pk_people PRIMARY KEY (id) ); COMMENT ON TABLE people IS ‘Renaming of person, final date = May 11 2010’ COMMENT ON TABLE person IS ‘Renamed to people, drop date = June 6 2010’

Rename Table (3) CREATE OR REPLACE TRIGGER SynchronizePeople BEFORE INSERT OR UPDATE ON person REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN IF updating THEN findAndUpdateIfNotFoundCreatePeople; END IF; IF inserting THEN createNewIntoPeople; END IF; IF deleting THEN deleteFromPeople; END IF; END; / CREATE OR REPLACE TRIGGER SynchronizePerson BEFORE INSERT OR UPDATE ON people REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN IF updating THEN findAndUpdateIfNotFoundCreatePerson; END IF; IF inserting THEN createNewIntoPerson; END IF; IF deleting THEN deleteFromPerson; END IF; END; /

Rename Table (4) Phase II: DROP TRIGGER SynchronizePeople; DROP TRIGGER SynchronizePerson; DROP TABLE person; Data-Migration Mechanics Must first copy the data INSERT INTO people SELECT * FROM person; Access Program Update Mechanics External access programs must be refactored to work with new table rather than old table

Add Lookup Table (1) Create a lookup table for an existing column

Add Lookup Table (2) Motivation Introduce referential integrity Provide code lookup Replace a column constraint Provide detailed descriptions Potential Tradeoffs Need to be able to provide valid data to populate the lookup table There will be a performance impact resulting from the addition of a foreign key constraint Schema Update Mechanics Determine the table structure Introduce the table Determine lookup data Introduce referential constraint CREATE TABLE state( State CHAR(2) NOT NULL, Name CHAR(50), CONSTRAINT pk_state PRIMARY KEY (state) );

Add Lookup Table (3) ALTER TABLE address ADD CONSTRAINT fk_address_state FOREIGN KEY (state) REFERENCES state DEFERRABLE; Data-Migration Mechanics ensure that the data values in the column have corresponding values in the lookup table INSERT INTO state(state) SELECT DISTINCT UPPER(state) FROM address; UPDATE address SET state = ‘CA’ WHERE UPPER(state) in (‘CA’, ‘CALIFORNIA’); UPDATE state SET name = ‘California’ WHERE state = ‘CA’; Access Program Update Mechanics Ensure that external programs now use the data values from the lookup table Some programs may choose to cache the data values, whereas others will access as needed

Introduce Column Constraint (1) Introduce a column constraint in an existing table

Introduce Column Constraint (2) Motivation Ensure that all applications interacting with your database persist valid data in the column Potential Tradeoffs Individual applications may have their own unique version of a constraint for this column Schema Update Mechanics ALTER TABLE person ADD CONSTRAINT ck_person_gender CHECK (gender IN (‘MALE’, ‘FEMALE’, ‘UNKNOWN’)); Data-Migration Mechanics Make sure that existing data conforms to the constraint that is being applied on the column UPDATE person SET gender = ‘UNKNOWN’ WHERE gender IS NULL; Access Program Update Mechanics ensure that the access programs can handle any errors being thrown by the database when the data being written to the column does not conform to the constraint

Introduce Default Value (1) Let the database provide a default value for an existing table column

Introduce Default Value (2) Motivation Want the value of a column to have a default value populated when a new row is added to a table Potential Tradeoffs Identifying a true default can be difficult Unintended side effects Confused context Schema Update Mechanics ALTER TABLE person MODIFY lastchange DEFAULT SYSDATE; Data-Migration Mechanics The existing rows may already have null values in this column, rows that will not be automatically updated as a result of adding a default value UPDATE person SET lastchange = sysdate WHERE lastchange IS NULL; Access Program Update Mechanics Invariants are broken by the new value Code exists to apply default values Existing source code assumes a different default value

Make Column Not-Nullable (1) Change an existing column such that it does not accept any null values

Make Column Not-Nullable (2) Motivation Every application updating this column is forced to provide a value for it Remove repetitious logic within applications that implement a not-null check Potential Tradeoffs Some programs may currently assume that the column is nullable and therefore not provide such a value Schema Update Mechanics ALTER TABLE person MODIFY lastname NOT NULL; Data-Migration Mechanics May need to clean the existing if there are existing rows with a null value in the column UPDATE person SET lastname = ‘???’ where lastname IS NULL; Access Program Update Mechanics Refactor all the external programs to provide an appropriate value to this column whenever they modify a row within the table Must also detect and then handle any new exceptions that are thrown by the database

Add Foreign Key Constraint (1) Add a foreign key constraint to an existing table to enforce a relationship to another table

Add Foreign Key Constraint (2) Motivation Enforce data dependencies at the database level Potential Tradeoffs Reduce performance within your database Must be aware of the table dependencies in the database Schema Update Mechanics Choose a constraint checking strategy: immediate/deferred Create the foreign key constraint Introduce an index for the PK of the foreign table (optional) ALTER TABLE address ADD CONSTAINT fk_person_state FOREIGN KEY (state) REFERENCES state DEFERRABLE;

Add Foreign Key Constraint (3) Data-Migration Mechanics Ensure the referenced data exists Ensure that the foreign table contains all required rows Ensure that source table's foreign key column contains valid values Introduce a default value for the foreign key column Access Program Update Mechanics Identify and then update any external programs that modify data in the table where the foreign key constraint was added (Similar/Different/Nonexistent RI code) All external programs must be updated to handle any exception(s) thrown by the database as the result of the new foreign key constraint

Introduce Soft Delete (1) Introduce a flag to an existing table that indicates that a row has been deleted

Introduce Soft Delete (2) Motivation preserve all application data, typically for historical means Potential Tradeoffs Performance is potentially impacted Schema Update Mechanics Introduce the identifying column Determine how to update the flag Develop deletion code Develop insertion code ALTER TABLE person ADD is_deleted BOOLEAN; ALTER TABLE person MODIFY is_deleted DEFAULT FALSE; Data-Migration Mechanics UPDATE person SET is_deleted = FALSE; Access Program Update Mechanics change read queries to ensure that data read from the database has not been marked as deleted all external programs must change physical deletes to updates

Introduce Index (1) Introduce a new index of either unique or nonunique type

Introduce Index (2) Motivation Increase query performance on your database reads Potential Tradeoffs Too many indexes on a table will degrade performance Remove the duplicates first before applying unique index Schema Update Mechanics Determine type of index Add a new index Provide more disk space CREATE UNIQUE INDEX unq_person_ssn ON person(ssn); Data-Migration Mechanics Check for duplicate values if introducing a unique index Duplicate values must be updated or use a nonunique index instead Access Program Update Mechanics Analyze dependencies to determine which external programs to update Change your queries to make use of this new index

Introduce Read-Only Table (1) Create a read-only data store based on existing tables in the database

Introduce Read-Only Table (2) Motivation Improve query performance Summarize data for reporting Create redundant data Replace redundant reads Data security Improve database readability Potential Tradeoffs The users of the read-only table need to understand both the timeliness of the copied data as well as the volatility of the source data to determine whether the read-only table is acceptable Schema Update Mechanics Introduce the new table/materialized view Determine a population strategy

Introduce Read-Only Table (3) Via materialized view: CREATE MATERIALIZED VIEW person_mv BUILD IMMEDIATE REFRESH FORCE ON COMMIT WITH PRIMARY KEY AS SELECT p.id, p.firstname, p.lastname, p.birthday, a.line1 || a.line2 || s.name || a.zipcode FROM person p, address a, state s WHERE p.address_id = a.id AND a.state = s.state; / Via new table: CREATE TABLE person_mv ( id NUMBER NOT NULL, firstname VARCHAR2(30), lastname VARCHAR2(20), address VARCHAR2(255), CONSTRAINT person_mv_id PRIMARY KEY (id) ); COMMENT ON person_mv ‘read-only table’; /

Introduce Read-Only Table (4) Data-Migration Mechanics Copy all the relevant source data into the read-only table Apply your population strategy (real-time or periodic batch) Periodic refresh Materialized views Use trigger-based synchronization Use real-time application updates INSERT INTO person_mv(id,firstname,lastname,birthday,address) SELECT p.id, p.firstname, p.lastname, p.birthday, a.line1 || a.line2 || s.name || a.zipcode FROM person p, address a, state s WHERE p.address_id = a.id AND a.state = s.state; Access Program Update Mechanics Make sure that the application uses this for read-only purposes Must change all the places where you currently access the source tables and rework them to use this instead

References http:// www.agiledata.org / Refactoring Databases: Evolutionary Database Design

Refactoring database

More Related Content

What's hot

Similar to Refactoring database

Refactoring database

Editor's Notes