Design dbms

573 views

Published on

Database Management Systems

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
573
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
28
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Design dbms

  1. 1. Relational Database Design
  2. 2. Relational Database Design <ul><li>RDBMS design issues – Pitfalls and Normalization. </li></ul><ul><li>Overview of Normal Forms </li></ul><ul><li>Pitfalls in Relational Database Design </li></ul><ul><li>Functional Dependencies </li></ul><ul><li>Decomposition </li></ul><ul><li>Boyce-Codd Normal Form </li></ul><ul><li>Third Normal Form </li></ul><ul><li>Multivalued Dependencies and Fourth Normal Form </li></ul><ul><li>Overall Database Design Process </li></ul>
  3. 3. RDBMS Design issues <ul><li>So far we have assumed that attributes are grouped to form a relation schema by using the common sense of database designer or by mapping a schema defined by ER model. </li></ul><ul><li>The goodness of Design should be evaluated for tables with few attributes or the tables with large number of attributes ? </li></ul><ul><li>Can we combine two tables ( Schemas ) without any problem? </li></ul><ul><li>We still need some formal measure of why one grouping of attributes into a relation schema may be better than another. </li></ul>
  4. 4. The Banking Schema <ul><li>branch = ( branch_name , branch_city , assets ) </li></ul><ul><li>customer = ( customer_id , customer_name , customer_street , customer_city ) </li></ul><ul><li>loan = ( loan_number , amount ) </li></ul><ul><li>account = ( account_number , balance ) </li></ul><ul><li>employee = ( employee_id . employee_name , telephone_number , start_date ) </li></ul><ul><li>dependent_name = ( employee_id , dname ) </li></ul><ul><li>account_branch = ( account_number , branch_name ) </li></ul><ul><li>loan_branch = ( loan_number , branch_name ) </li></ul><ul><li>borrower = ( customer_id , loan_number ) </li></ul><ul><li>depositor = ( customer_id , account_number ) </li></ul><ul><li>cust_banker = ( customer_id , employee_id , type ) </li></ul><ul><li>works_for = ( worker_employee_id , manager_employee_id ) </li></ul><ul><li>payment = ( loan_number , payment_number , payment_date , payment_amount ) </li></ul><ul><li>savings_account = ( account_number , interest_rate ) </li></ul><ul><li>checking_account = ( account_number , overdraft_amount ) </li></ul>
  5. 5. Combine Schemas? <ul><li>Suppose we combine borrower and loan to get </li></ul><ul><ul><li>bor_loan = ( customer_id , loan_number , amount ) </li></ul></ul><ul><li>Result is possible repetition of information (L-100 in example below) </li></ul>
  6. 6. A Combined Schema Without Repetition <ul><li>Consider combining loan_branch and loan </li></ul><ul><ul><li>loan_amt_br = ( loan_number , amount , branch_name ) </li></ul></ul><ul><li>No repetition (as suggested by example below) </li></ul>
  7. 7. What About Smaller Schemas? <ul><li>Suppose we had started with bor_loan. How would we know to split up ( decompose ) it into borrower and loan ? </li></ul><ul><li>Write a rule “ if there were a schema ( loan_number, amount ) , then loan_number would be a candidate key ” </li></ul><ul><li>Denote as a functional dependency : </li></ul><ul><li>loan_number  amount </li></ul><ul><li>In bor_loan , because loan_number is not a candidate key, the amount of a loan may have to be repeated. This indicates the need to decompose bor_loan . </li></ul><ul><li>Not all decompositions are good. Suppose we decompose employee into </li></ul><ul><li>employee1 = ( employee_id , employee_name ) </li></ul><ul><li>employee2 = ( employee_name , telephone_number , start_date ) </li></ul><ul><li>The next slide shows how we lose information -- we cannot reconstruct the original employee relation -- and so, this is a lossy decomposition. </li></ul>
  8. 8. A Lossy Decomposition
  9. 9. Pitfalls in Relational Database Design <ul><li>Relational database design requires that we find a “good” collection of relation schemas. A bad design may lead to </li></ul><ul><ul><li>Repetition of Information. </li></ul></ul><ul><ul><li>Inability to represent certain information. </li></ul></ul><ul><li>Design Goals: </li></ul><ul><ul><li>Avoid redundant data </li></ul></ul><ul><ul><li>Ensure that relationships among attributes are represented </li></ul></ul><ul><ul><li>Facilitate the checking of updates for violation of database integrity constraints. </li></ul></ul>
  10. 10. RDBMS Design issues <ul><li>So far we have assumed that attributes are grouped to form a relation schema by using the common sense of database designer or by mapping a schema defined by ER model. </li></ul><ul><li>We still need some formal measure of why one grouping of attributes into a relation schema may be better than another. </li></ul><ul><li>Unsatisfactory relation schemas that do not meet certain conditions – the normal form tests – are decomposed into smaller relation schemas that meet the tests and hence possess the desirable properties. </li></ul><ul><li>Thus, the normalization procedure provides database designers with; </li></ul><ul><ul><li>A formal framework for analyzing relation schemas based on their keys and on the functional dependencies among their attributes. </li></ul></ul><ul><ul><li>A series of normal form tests that can be carried out on individual relation schemas so that the relational database can be normalized to any desired degree. </li></ul></ul>
  11. 11. First Normal Form <ul><li>A relational database table that adheres to 1NF is one that meets a certain minimum set of criteria. </li></ul><ul><li>These criteria are basically concerned with ensuring that the table is a faithful representation of a relation and that it is free of repeating groups. </li></ul><ul><li>Some definitions of 1NF, most notably that of Edgar F. Codd, make reference to the concept of atomicity . </li></ul><ul><li>Codd states that the &quot; values in the domains on which each relation is defined are required to be atomic with respect to the DBMS. &quot; </li></ul><ul><li>Codd defines an atomic value as one that &quot; cannot be decomposed into smaller pieces by the DBMS (excluding certain special functions) .“ </li></ul><ul><li>Meaning a field should not be divided into parts with more than one kind of data in it such that what one part means to the DBMS depends on another part of the same field. </li></ul>
  12. 12. First Normal Form <ul><li>Domain is atomic if its elements are considered to be indivisible units </li></ul><ul><ul><li>Examples of non-atomic domains: </li></ul></ul><ul><ul><ul><li>Set of names, composite attributes </li></ul></ul></ul><ul><ul><ul><li>Identification numbers like CS101 that can be broken up into parts </li></ul></ul></ul><ul><li>A relational schema R is in first normal form if the domains of all attributes of R are atomic </li></ul><ul><li>Non-atomic values complicate storage and encourage redundant (repeated) storage of data </li></ul><ul><ul><li>Example: Set of accounts stored with each customer, and set of owners stored with each account </li></ul></ul><ul><ul><li>We assume all relations are in first normal form (and revisit this again!) </li></ul></ul>
  13. 13. First Normal Form (Cont’d) <ul><li>Atomicity is actually a property of how the elements of the domain are used. </li></ul><ul><ul><li>Example : Strings would normally be considered indivisible </li></ul></ul><ul><ul><li>Suppose that students are given roll numbers which are strings of the form CS0012 or EE1127 </li></ul></ul><ul><ul><li>If the first two characters are extracted to find the department, the domain of roll numbers is not atomic. </li></ul></ul><ul><ul><li>Doing so is a bad idea: leads to encoding of information in application program rather than in the database. </li></ul></ul>
  14. 14. Goal — Devise a Theory for the Following <ul><li>Decide whether a particular relation R is in “ good ” form. </li></ul><ul><li>In the case that a relation R is not in “ good ” form, decompose it into a set of relations { R 1 , R 2 , ..., R n } such that </li></ul><ul><ul><li>each relation is in good form </li></ul></ul><ul><ul><li>the decomposition is a lossless-join decomposition </li></ul></ul><ul><li>Our theory is based on: </li></ul><ul><ul><li>functional dependencies </li></ul></ul><ul><ul><li>multivalued dependencies </li></ul></ul>
  15. 15. Overview of Normal Forms <ul><li>1NF ( First Normal Form) </li></ul><ul><li>To understand </li></ul><ul><li>2NF </li></ul><ul><li>3NF </li></ul><ul><li>BCNF Concept of FD’s ( Functional Dependency ) required </li></ul><ul><li>To understand </li></ul><ul><li>4NF </li></ul><ul><li>5NF </li></ul><ul><li>Concept of MVD ( Multi Valued Dependency ) is required </li></ul>
  16. 16. Normalization <ul><li>The basic objective of normalization is to reduce the various anomalies in the database. </li></ul><ul><li>Normalization can be looked upon as a process of analyzing the given relation schemas based on their FDs and primary keys to achieve the desirable properties of ; </li></ul><ul><ul><li>Minimizing redundancy </li></ul></ul><ul><ul><li>Minimizing the insertion , deletion , and update anomalies. </li></ul></ul><ul><li>Unsatisfactory relation schemas that do not meet certain conditions – the normal form tests – are decomposed into smaller relation schemas that meet the tests and hence possess the desirable properties. </li></ul><ul><li>Thus, the normalization procedure provides database designers with; </li></ul><ul><ul><li>A formal framework for analyzing relation schemas based on their keys and on the functional dependencies among their attributes. </li></ul></ul><ul><ul><li>A series of normal form tests that can be carried out on individual relation schemas so that the relational database can be normalized to any desired degree. </li></ul></ul>
  17. 17. Normalization… <ul><li>The normal form of a relation refers to the highest normal form condition that it meets, and hence indicates the degree to which it has been normalized. </li></ul><ul><li>Normal forms when considered in isolation from other factors, do not guarantee a good database design . </li></ul><ul><li>It is generally not sufficient to check separately that each relation schema in the database is, say, in BCNF or 3NF. </li></ul><ul><li>Rather, the process of normalization through decomposition must also confirm the existence of additional properties that the relation schemas, taken together should possess; </li></ul><ul><ul><li>The Lossless join , </li></ul></ul><ul><ul><li>The dependency preservation property , which ensures that each functional dependency is represented in some individual relations resulting after decomposition. </li></ul></ul>
  18. 18. RDBMS design <ul><li>RDBMS design involves checking the current design through Normal Form Test. </li></ul><ul><li>If the design is not in desired Normal form, then </li></ul><ul><ul><li>Decompose the Relations (Tables) into smaller ones </li></ul></ul><ul><ul><li>Fulfill the properties of decomposition . </li></ul></ul><ul><ul><li>Properties of decomposition </li></ul></ul><ul><ul><ul><li>Functional dependency preservation </li></ul></ul></ul><ul><ul><ul><ul><li>Identify the FDs in the given Relation schema </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Apply Armstrong Axioms to find set of all FDs (Closure) </li></ul></ul></ul></ul>
  19. 19. Overview <ul><li>To understand </li></ul><ul><li>2NF </li></ul><ul><li>3NF FD’s required </li></ul><ul><li>BCNF FD’s & Closure of FDs required </li></ul><ul><li>To understand </li></ul><ul><li>4NF </li></ul><ul><li>5NF </li></ul><ul><li>MVD is required </li></ul>
  20. 20. First Normal Form <ul><li>Domain is atomic if its elements are considered to be indivisible units </li></ul><ul><ul><li>Examples of non-atomic domains: </li></ul></ul><ul><ul><ul><li>Set of names, composite attributes </li></ul></ul></ul><ul><ul><ul><li>Identification numbers like CS101 that can be broken up into parts </li></ul></ul></ul><ul><li>A relational schema R is in first normal form if the domains of all attributes of R are atomic </li></ul><ul><li>Non-atomic values complicate storage and encourage redundant (repeated) storage of data </li></ul>

×