n Normalizationn Normal Forms n 1 NF n 2 NF n 3 NFn Codd’s Rules
Data Normalizationn The purpose of normalization is to produce a stable set of relations that is a faithful model of the operations of the enterprise. n Achieve a design that is highly flexible n Reduce redundancy n Ensure that the design is free of certain update, insertion and deletion anomalies
Normalization 1NF 1NF Flat file 2NF 2NF Partial dependencies removed 3NF 3NF Transitive dependencies removed BCNF BCNF Every determinant is a candidate key 4NF 4NF Non-tivial multi-valued dependencies removed
Order No. 10001 Stereos To GoDate: 6 / 15 / 99 Invoice Stereos To Go Go, HogsAccount No. 0000-000-0000-0Customer: John Smith 0000 000 0000 0 Address: 2036-26 Street John Smith 1/05 Sacramento CA 95819 City State Zip CodeDate Shipped: 6 / 18 / 99Item ProductNumber Code Product Description/Manufacturer Qty Price 1 SAGX730 Pioneer Remote A/V Receiver 1 56995 2 AT10 Cervwin Vega Loudspeakers 35995 1 3 CDPC725 Sony Disc-Jockey CD Changer 1 39995 4 5 Subtotal 132985 Shipping & Handling 10000 Sales Tax 10306 Total 153291
Unnormalized Relation(Invoice_number, Invoice_date, Date_delivered, Cust_accountCust_name Cust_addr Cust_city Cust_state Zip_code,Item1 Item1_descrip Item1_qty Item1_price,Item2 Item2_descrip Item2_qty Item2_price, . . . ,Item7 Item7_descrip Item7_qty Item7_price)How would a program process the data to recreate the invoice?
Unnormalized to 1NF(Invoice_number, Invoice_date, Date_delivered, Cust_accountCust_name Cust_addr Cust_city Cust_state Zip_code,Item1, Item1_descrip, Item1_qty, Item1_price,Item2, Item2_descrip, Item2_qty, Item2_price, . . . , Repeating groupsItem7, Item7_descrip, Item7_qty, Item7_price)A flat file places all the data of a transaction into a single record. record. This is reminiscent of a COBOL or BASIC program processing a single transaction with one read statement.
Unnormalized to 1NF(Invoice_number, Invoice_date, Date_delivered, Cust_account,Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code,Item, Item_descrip, Item_qty, Item_price)Nominated group of attributes to serve as the key(form a unique combination) • Eliminate the repeating groups. • Each row retains data for one item. • If a person bought 5 items, we would have five tuples
1NF r er e b e b m um num r na Flat File n t e i ce un m vo co sto Item Item In Ac Cu Item Description Quantity Price10001 123456 John Smith ••• SAGX730 Pioneer Remote A/V Rec10001 123456 John Smith ••• SAGX730 Pioneer Remote A/V Rec 1 1 569.95 569.9510001 123456 John Smith •••10001 123456 John Smith ••• AT10 AT10 Cerwin Vega Loudspeakers 1 Cerwin Vega Loudspeakers 1 359.95 359.9510001 123456 John Smith ••• CDPC725 Sony Disc Jockey CD10001 123456 John Smith ••• CDPC725 Sony Disc Jockey CD 1 1 399.95 399.9510001 123456 John Smith ••• S/H10001 123456 John Smith ••• S/H Shipping Shipping 1 1 100.00 100.0010001 123456 John Smith ••• Tax10001 123456 John Smith ••• Tax Sales Tax Sales Tax 1 1 103.06 103.06
From 1NF(Invoice_number, Invoice_date, Date_delivered,Cust_account, Cust_name, Cust_addr, Cust_city,Cust_state, Zip_code,Item, Item_descrip, Item_qty, Item_price) Functional dependencies and determinants Example: item_descrip is functionally dependent on item, such that item is the determinant of item_descript.
From 1NF to 2NF(Invoice_number, Invoice_date, Date_delivered,Cust_account, Cust_name, Cust_addr, Cust_city,Cust_state, Zip_code)(Item, Item_descrip, Item_qty, Item_price) Is this unique by itself? What happens if the item is purchased more than once?
From 1NF to 2NF(Invoice_number, Invoice_date, Date_delivered,Cust_account, Cust_name, Cust_addr, Cust_city,Cust_state, Zip_code) Partial dependency(Invoice_number, Item, Item_descrip, Item_qty, Item_price) Composite key (forms a unique combination)
From 1NF to 2NF(Invoice_number, Invoice_date, Date_delivered, Cust_account,Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code)(Invoice_number, Item, Item_qty, Item_price)(Item, Item_descrip)
From 2NF to 3NF(Invoice_number, Invoice_date, Date_delivered, Cust_account,Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code)(Invoice_number, Item, Item_qty, Item_price)(Item, Item_descrip) Which attributes are dependent on others? Is there a problem?
Transitive Dependencies andAnomaliesn Insertion anomalies n To add a new row, all customer (name, address, city, state, zip code, phone) and products (description) must be consistent with previous entriesn Deletion anomalies n By deleting a row, a customer or product may cease to existn Modification anomalies n To modify a customer’s or product’s data in one row, all modifications must be carried
Insertion and Modification Anomalies For example…Insert a new Panasonic product Product_code Manufacturer_name DVD-A110 DVD-A110 Panasonic Panasonic PV-4210 PV-4210 Panasonic Panasonic CT-32S35 CT-32S35 PAN PAN PV-4250 PV-4250 Panasonic Panasonic Inconsistency DVD-A110 DVD-A110 Panasonic Panasonic Change all Panasonic PV-4210 PV-4210 PanaSonic PanaSonic PV-4250 Pana Sonic products’ manufacturer PV-4250 Pana Sonic CT-32S35 CT-32S35 PAN PAN name to “Panasonic USA”
Deletion Anomaly For Example…4377182 John Smith lll Sacramento CA 958314398711 Arnold S lll Davis CA 956914578461 Gray Davis lll Sacramento CA 958314873179 Lisa Carr lll Reno NV 89557By deleting customer Arnold S, we would also be deleting Davis, California.
Invoice_numberTransitive Invoice_dateDependencies Date_delivered Cust_account Cust_nameŸ A condition where A, B, C are attributes of a relation Cust_addr such that if A à B and Cust_city B à C, then C is transitively Cust_state dependent on A via B Zip_code (provided that A is not functionally dependent on B Item or C). Item_descrip Invoice_number+Item Item_qty Item_price
Why Should City and State BeSeparated from Customer Relation? n City and state are dependent on zip code for their values and not the customer’s identifier (i.e., key). Zip_code à City, State n Otherwise, Cust_account à Cust_addr, Zip_code à City, State
3NFInvoice Relation(Invoice_number, Invoice_date, Date_delivered, Cust_account)Customer Relation(Cust_account, Cust_name, Cust_addr, Zip_code)Zip_code Relation(Zip_code, City, State)Invoice_items Relation(Invoice_number, Item, Item_qty, Item_price)Items Relation Manufacturers Relation(Item, Item_descrip) (Manuf_code, Manuf_name) Since the Items relation contains the manufacturer’s name in the description, a separate Manufacturers relation can be created
First to Third Normal Form (1NF - 3NF)n 1NF: A relation is in first normal form if and only if every attribute is single-valued for each tuple (remove the repeating or multi-value attributes and create a flat file)n 2NF: A relation is in second normal form if and only if it is in first normal form and the nonkey attributes are fully functionally dependent on the key (remove partial dependencies)n 3NF: A relation is in third normal form if it is in second normal form and no nonkey attribute is transitively dependent on the key (remove transitive dependencies)
Codds Rules E. F. Codd presented these rules as a basis of determining whether a DBMS could be classified as Relational
Codds Rulesn Codds Rules can be divided into 5 functional areas – n Foundation Rules n Structural Rules n Integrity Rules n Data Manipulation Rules n Data Independence Rules
Foundation Rulesn Rule 0 –n Any system claimed to be a RDBMS must be able to manage databases entirely through its relational capabilities. n All data definition & manipulation must be able to be done through relational ops.
Foundation Rulesn Rule 12 - Nonsubversion Rule -n If a RDBMS has a low level (record at a time) language, that low level language cannot be used to subvert or bypass the integrity rules &constraints expressed in the higher-level relational language. n All database access must be controlled through the DBMS so that the integrity of the database cannot be compromised without the knowledge of the user or the DBA. n This does not prohibit use of record at a time languages e.g. PL/SQL
Codds Rulesn Structural Rules (Rules 1 & 6) n The fundamental structural construct is the table. n Codd states that an RDBMS must support tables, domains, primary & foreign keys. n Each table should have a primary key.
Structural Rulesn Rule 1 -n All info in a RDB is represented explicitly at the logical level in exactly one way - by values in a table. n ALL info even the Metadata held in the system catalogue MUST be stored as relations(tables) & manipulated in the same way as data.
Structural Rulesn Rule 6 - View Updating –n All views that are theoretically updatable are updatable by the system. n Not really implemented yet by any available system.
Codds Rulesn Integrity Rules (Rules 3 & 10) n Integrity should be maintained by the DBMS not the application.n Rule 3 - Systematic treatment of null values -n Null values are supported for representation of missing & inapplicable information in a systematic way & independent of data type.
Integrity Rulesn Rule 10 - Integrity independence -n Integrity constraints specific to a particular RDB MUST be definable in the relational data sublanguage & storable in the DB, NOT the application program. n This gives the advantage of centralised control & enforcement
Codds Rulesn Data Manipulation Rules (Rule 2, 4, 5 & 7)n User should be able to manipulate the Logical View of the data with no need for knowledge of how it is Physically stored or accessed.n Rule 2 - Guaranteed Access -n Each & every datum in an RDB is guaranteed to be logically accessible by a combination of table name, primary key value & column name.
Data Manipulation Rulesn Rule 4 - Dynamic on-line Catalog based on relational modeln The DB description (metadata) is represented at logical level in the same way as ordinary data, so that same relational language can be used to interrogate the metadata as regular data. n System & other data stored & manipulated in the same way.
Data Manipulation Rulesn Rule 5 - Comprehensive Data Sublanguage -n RDBMS may support many languages & modes of use, but there must be at least ONE language whose statements can express ALL of the following - n Data Definition n View Definition n Data manipulation (interactive & via program) n Integrity constraints n Authorization n Transaction boundaries (begin, commit & rollback) n 1992 - ISO standard for SQL provides all these functions
Data Manipulation Rulesn Rule 7 - High-level insert, update & delete -n Capability of handling a base table or view as a single operand applies not only to data retrieval but also to insert, update & delete operations.
Codds Rulesn Data Independence Rules (Rules 8, 9 11)n These rules protect users & application developers from having to change the applications following any low-level reorganisation of the DB.
Data Independence Rulesn Rule 8 - Physical Data Independence -n Application Programs & Terminal Activities remain logically unimpaired whenever any changes are made either to the storage organisation or access methods.n Rule 9 - Logical Data Independence -n Appn Progs & Terminal Acts remain logically unimpaired when information-preserving changes of any kind that theoretically permit unimpairment are made to the base tables.
Data Independence Rulesn Rule 11 - Distribution Independence -n The data manipulation sublanguage of an RDBMS must enable application programs & queries to remain logically unchanged whether & whenever data is physically centralised or distributed.
Data Independence Rulesn Rule 11 - Distribution Independence - n This means that an Application Program that accesses the DBMS on a single computer should also work ,without modification, even if the data is moved from one computer to another in a network environment. n The user should see one centralised DB whether data is located on one or more computers.
Data Independence Rulesn Rule 11 - Distribution Independence – n This rule does not say that to be fully Relational the DBMS must support distributed DBs but that if it does the query must remain the same.
Summaryn Codds Rules can be divided into 5 functional areas – n Foundation Rules n Structural Rules n Integrity Rules n Data Manipulation Rules n Data Independence Rules