UNIT IV
RELATIONAL DATABASEDESIGN
Armstrong’s Axioms
Design issues Decomposition
Normalization using Functional dependencies
Multi valued dependencies
join dependencies
Domain key normal form.
2.
Objectives
The purposeof normailization
Data redundancy and Update Anomalies
Functional Dependencies
The Process of Normalization
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
3.
Database Normalization
Databasenormalization is the process of removing redundant data
from your tables to improve storage efficiency, data integrity, and
scalability.
In the relational model, methods exist for quantifying how efficient
a database is. These classifications are called normal forms (or
NF), and there are algorithms for converting a given database
between them.
Normalization generally involves splitting existing tables into
multiple ones, which must be re-joined or linked each time a query
is issued
History
Edgar F.Codd first proposed the process of normalization and what
came to be known as the 1st normal form in his paper A
Relational Model of Data for Large Shared Data Banks Codd
stated:
“There is, in fact, a very simple elimination procedure which we
shall call normalization. Through decomposition nonsimple
domains are replaced by ‘domains whose elements are atomic
(nondecomposable) values.’”
6.
Normal Form
EdgarF. Codd originally established three normal forms: 1NF, 2NF
and 3NF.
There are now others that are generally accepted, but 3NF is widely
considered to be sufficient for most applications. Most tables when
reaching 3NF are also in BCNF (Boyce-Codd Normal Form).
7.
Normalization isa technique for producing a set of relations with
desirable properties, given the data requirements of an enterprise.
The process of normalization is a formal method that identifies
relations based on their primary or candidate keys and the
functional dependencies among their attributes.
The Purpose of Normalization
8.
Update Anomalies (Abnormality,Irregularity )
Relations that have redundant data may have problems called
update anomalies, which are classified as ,
Insertion anomalies
Deletion anomalies
Modification anomalies
Example of Update Anomalies
To insert a new staff with branchNo B007 into the StaffBranch
relation;
To delete a tuple that represents the last member of staff located at
a branch B007;
To change the address of branch B003.
9.
FUNCTIONAL DEPENDENCIES &
NORMALIZATIONFOR RELATIONAL
DATABASES
FUNCTIONAL DEPENDENCIES:
A functional dependency is a constraint between two sets of attributes from
the database.
A functional dependency denoted by x->y of set R between two sets of
attributes x and y that are subsets of R specifies a constraint on the possible
tuples that can form a relation state r (R).
The constraint is: for any two tuples t1 and t2 in r that have,
This means that the values of Y component of a tuple in r depend on, or are
determined by, the values of the x component.(or)
The value of the x component of a tuple functionally determine the values
of the y component.
Thus , x functionally determines Y is a relation schema R if and only
if,whenever two tuples of r (R) agree on their x-value,they must necessarily
agree on their y-
T1[x]=t2[x],& t1[y]=t2[y]
10.
Functional Dependencies
• Functionaldependency describes the relationship between attributes
in a relation.
• For example, if A and B are attributes of relation R, and B is
functionally dependent on A ( denoted A B), if each value of A is
associated with exactly one value of B. ( A and B may each consist of
one or more attributes.)
A B
B is functionally
dependent on A
Determinant Refers to the attribute or group of attributes
on the left-hand side of the arrow of a
functional dependency
11.
• Functional dependenciesin EMP-PROJ relation schema are:
FD2:SSN->ENAME
FD3:PNUMBER->{PNAME,PLOC}
FD1:{SSN,PNUMBER}-> HOURS.
FD1:
Specifies a combination of SSN & PNUMBER values uniquely determines the
number of hours the employee works on the project per week.
FD2:
• Specifies the value of an employees SSN uniquely determines the employee
name(ENAME).
FD3:
Specifies the value of a project number (PNUMBER)uniquely determines the
project name(PNAME)and location.
12.
The inferencerules in DBMS describes the New functional
Dependency derived from two existed entity which are
functionally dependent.
Inference Rules
or
Armstrong's Inference Rules
13.
• A setof all functional dependencies that are implied by a given set
of functional dependencies X is called closure of X, written X+
. A set
of inference rule is needed to compute X+
from X.
1. Reflexivity : If B is a subset of A, then A B
2. Augmentation: If A B, then A, C B
3. Transitivity: If A B and B C, then A C
4. Self-determination: A A
5. Decomposition: If A B,C then A B and A C
6. Union: If A B and A C, then A B,C
7. Composition: If A B and C D, then A,C B
Inference Rules
Armstrong’s axioms
14.
EQUIVALENCE OF SETOF FUNCTIONAL DEPENDENCIES:
Two sets of functional dependencies E and F are Equivalent if E+
=F+
Equivalance means that every FD in E can be inferred from F,and
every FD in F can be inferred from E.
MINIMAL SETS OF FUNCTIONAL DEPENDENCIES:
A set of functional dependencies F is minimal if it satisfies the
following conditions.
1.Every dependency in F has a single attribute, for its righthand side.
2. We cannot replace any dependency x->a in F with a dependency
y-->a, where y is a proper subset of x, still have a set of
dependencies that is equivalent to F.
3. We cannot remove any dependency from F and still have a set of
dependencies that is equivalent to F.
15.
History
Edgar F.Codd first proposed the process of normalization and what
came to be known as the 1st normal form in his paper A Relational
Model of Data for Large Shared Data Banks Codd stated:
“There is, in fact, a very simple elimination procedure which we
shall call normalization. Through decomposition nonsimple domains
are replaced by ‘domains whose elements are atomic
(nondecomposable) values.’”
16.
Normal Form
• EdgarF. Codd originally established three normal forms: 1NF, 2NF
and 3NF. There are now others that are generally accepted, but 3NF is
widely considered to be sufficient for most applications. Most tables
when reaching 3NF are also in BCNF (Boyce-Codd Normal Form).
17.
NORMALIZATION
• Normalization ofdata is a process of analyzing the given relation
schema based on their functional dependencies and primary keys
to achieve the desirable properties of
1. Minimizing redundancy.
2. Minimizing the insertion , deletion, and update anomalies.
• Normal form tests –are decomposed into smaller relation schemas
that meet the tests and hence posses the desirable properties.
• A series of normal form tests that can be carried out on individual
relation schemas.
• So that the relation database can be normalized to any desired
degree.
18.
Normalization
• Normalization isthe process of efficiently organizing data in a
database with two goals in mind
First goal: eliminate redundant data
– for example, storing the same data in more than one table
Second Goal: ensure data dependencies make sense
– for example, only storing related data in a table
19.
Benefits of Normalization
Lessstorage space
Quicker updates
Less data inconsistency
Clearer data relationships
Easier to add data
Flexible Structure
20.
The Solution: NormalForms
• Bad database designs results in:
– redundancy: inefficient storage.
– anomalies: data inconsistency, difficulties in
maintenance
• 1NF, 2NF, 3NF, BCNF are some of the early
forms in the list that address this problem
21.
• ADDITIONAL PROPERTIES:
The lossless join or non additive join property, which garantees that
the spnrious tuple generation problem. It does not occure with respect
to the relation schemas created after decomposition.
The dependency preservation property, which ensures that each
functional dependency is represented in some individual resulting
after decomposition.
• Denormalization:
It is the process of storing the join of higher normal form relations as
a base relation –which in a lower normal form.
An attribute of relation schema R is called a prime attribute if it is a
member of some candidate key of R.
An attribute is called nonprime if it is not a prime attribute- that is,
if it is not a member of any candidate key.
22.
Codd's 12 rules
1.The information rule
• The information rule simply requires all information in the database
to be represented in one and only one way, Namely by values in
column positions within rows of tables.
2. The guaranteed access rule
• This rule is essentially a restatement of the fundamental requirement
for primary keys.
• It says that every individual scalar value in the database must be
logically addressable by specifying the mane of the containing table,
the name of the containing column and the primary key value of the
containing row.
23.
3. Systematic treatmentof null values
• The DBMS is required to support a representation of "missing
information and inapplicable information" that is systematic, distinct
from all regular values (for example, "distinct from zero or any other
number," in the case of numeric values), and independent of data
type.
• It is also implied that such representations must be manipulated by
the DBMS in a systematic way.
4. Active online catalog based on the relational model
• The system is required to support an online, inline, relational catalog
that is
• accessible to authorized users by means of their regular query
language.
24.
5.The comprehensive datasublanguage rule
• The system must support a least one relational language that
• (a) has a linear syntax,
• (b) can be used both interactively and within application programs,
• (c) supports data definition operations (including view definitions),
• data manipulation operations (update as well as retrieval),
• security and integrity constraints, and transaction management
• operations (begin, commit, and rollback).
6. The view updating rule
• All views that are theoretically updatable must be updatable by the
system.
25.
7. High-level insert,update, and delete
• The system must support set-at-a-time INSERT, UPDATE, and
DELETE operators.
8. Physical data independence
9. Logical data independence
10. Integrity independence
• Integrity constraints must be specified separately from application
programs and
• stored in the catalog. It must be possible to change such constraints as
and when
• appropriate without unnecessarily affecting existing applications.
26.
11. Distribution independence
Existingapplications should continue to operate successfully
(a) when a distributed version of the DBMS is first introduced;
(b) when existing distributed data is redistributed around the system.
12. The non subversion rule
If the system provides a low-level (record-at-a-time) interface, then that
• interface cannot be used to subvert the system (e.g.) bypassing a
relational
• security or integrity constraint.
27.
Normal forms basedon primary keys:
1.First Normal Form
2.Second Normal Form
3.Third Normal Form
First Normal Form:
• First normal form disallow multivalued attributes composite
attributes , and their combinations.
• It states that the domain of an attribute must include only atomic
values and that the value of any attribute in a tuple must be a single
value from the domain of that attribute.
• First normal form disallow “ relations within relations” or “ relations
as attributes of tuples”
• The only attribute values permitted by 1NF are single atomic values.
28.
• First NormalForm:
First normal form disallow multivalued attributes composite
attributes , and their combinations.
It states that the domain of an attribute must include only atomic
values and that the value of any attribute in a tuple must be a single
value from the domain of that attribute.
First normal form disallow “ relations within relations” or “ relations
as attributes of tuples”
The only attribute values permitted by 1NF are single atomic values.
29.
• The abovetable can be brought into 1NF by dividing into three
component attributes location1,location2 and location3,which
makes the relation schema to look like this.
34.
SECOND NORMAL FORM:
Table must be in First Normal Form
SNF is based on the concept of full functional dependency.
A functional dependency x-> y is a full functional dependency if
removal of any attribute a from x means that the dependency does not
hold any more.
ie.,for any attribute a €x,(x-{a}).Does not functionally determine y.
A functional dependency x-> y is a partial dependency if some
attribute a€x can be removed from x and the dependency still holds.
i.e for A € (x {A})→ y
EXAMPLE:
( I ) {Ssn, Pnumber ) → hours is a full functional dependency .
( II ) {Ssn,Pnumber ) →Ename is partial because Ssn→Ename
holds.
35.
Definition :Arelation schema R is in 2NF if every nonprime attribute
A in R is fully functionally dependent on the primary key of R.
NOTE:
The test for 2NF involves testing for functional dependencies whose
left-hand side attributes are part of the primary key.
If the primary key contains a single attribute, the test need not be
applied at all.
37.
THIRD NORMAL FORM
•Third normal form (3NF) is based on the concept of transitive
dependency.
• A functional dependency X→ Y in a relational schema R is in
transitive dependency if there is a set of attributes Z that is neither a
candidate key nor a subset of any key of R and both
• X→ Z & Z→ Y hold
• Definition: According to Codd's original definition, a relation schema
R is in 3NF if it satisfies 2NFand no nonprime attribute of R is
transitively dependent on the primary key.
38.
The relationschema EMP_DEPT in Figure is in 2NF, since no partial
dependencies on a key exist.
However, EMP_DEPT is not in 3NF because of the transitive
dependency of DMGRSSN (and also DNAME) on SSN via
DNUMBER.
We can normalize EMP_DEPT by decomposing it into the two 3NF
relation schemas EDl and ED2
40.
BOYEE-CODD NORMALFORM
It is a simple form of 3NF
Every relation in BCNF is also in 3NF, a relation in 3NF is not
necessarily in BCNF.
The formal definition of BCNF is: A relation schema R is in
BCNF if when ever a non trivial functional dependency X → A
holds in R then X is a super key of R.
The difference between the definition if BCNF and 3NF is that
condition of 3NF,which allows A to be prime, is absent from
BCNF.
42.
• Our example,FD5 violates BCNF in LOTsIA because AREA is not a
superkey of LOTslA.
• Note: that FD5 satisfies 3NF in LOTSIA because COUNTY_NAME
is a prime attribute (condition b), but this condition does not exist in
the definition of BCNF.
• We can decompose LOTSIA into two BCNF relations LOTSlAX and
LOTS lAy,
• This decomposition loses the functional dependency FD2 because
its attributes no longer coexist in the same relation after decomposition.
44.
• The relationschema R shown in illustrates the general case of such
a relation.
• Ideally, relational database design should strive to achieve BCNF or
3NF for every relation schema.
• Achieving the normalization status of just 1NF or 2NF is not
considered adequate, since they were developed historically as
stepping stones to 3NFand BCNF.
Functional Dependency
• Recallthat if X uniquely determines Y, then Y is
functionally dependent on X.
• You may recall math the terms Domain and Range.
The domain is the set of all values possible of X and
the range is the set of all possible values of Y.
• The relation is a function because each of the
elements of X maps exactly to one element of Y.
Definition of MVD
•A multivalued dependency is a full constraint
between two sets of attributes in a relation.
• In contrast to the functional independency, the
multivalued dependency requires that certain tuples
be present in a relation. Therefore, a multivalued
dependency is also referred as a tuple-generating
dependency. The multivalued dependency also plays
a role in 4NF normalization.
50.
• full constraint
–A constraint which expresses something about all
attributes in a database. (In contrary to an embedded
constraint.) That a multivalued dependency is a full
constraint follows from its definition, where it says
something about the attributes R − β.
• tuple-generating dependency
– A dependency which explicitly requires certain tuples
to be present in the relation.
51.
• Inference rulesfor functional and multivalued dependencies
IR1(reflexive rule for FDs):x≥y then x->y.
IR2(augmentation rule for FDs):{x->y)≠x2->y2. IR3(transitive rule for FDs):{x-
>y,y->z}≠x->z.
IR4(complementation rule for MUDs):{x->y}≠{x->(R-(xυy))}.
IR5(augmentation rule for MUDs): if x->y and w≥z then wx->yz.
IR6(transitive rule for MUDs):{x->y,y->z}≠x->(z-y).
REVIEW OF NFs
•1NF All values of the columns are atomic.
That is, they contain no repeating values.
• 2NF it is in 1NF and every non-key column
is fully dependent upon the primary key.
54.
REVIEW OF NFCont…
• 3NF it is already in 2NF and every non-key column is non
transitively dependent upon its primary key. In other
words, all non-key attributes are functionally dependent
only upon the primary key.
• BCNF A relation is in BCNF if every determinant is a
candidate key. This is an improved form of third normal
form.
Determinant: an attribute on which some other attribute is
fully functionally dependent
55.
4th Normal Form
ABoyce Codd normal form relation is in fourth normal
form if
(a) there is no multi value dependency in the relation
or
(b) there are multi value dependency but the
attributes, which are multi value dependent on a
specific attribute, are dependent between
themselves.
56.
4th
Normal Form Cont…
Thisis best discussed through mathematical notation.
Assume the following relation
R(a:pk1, b:pk2, c:pk3)
Recall that a relation is in BCNF if all its determinant are
candidate keys, in other words each determinant can be
used as a primary key.
Because relation R has only one determinant (a, b, c),
which is the composite primary key and since the
primary is a candidate key therefore R is in BCNF.
57.
4th
Normal Form Cont…
NowR may or may not be in fourth normal form.
1. If R contains no multi value dependency then R will be in Fourth normal
form.
2. Assume R has the following two-multi value dependencies:
a --->> b and a --->> c
In this case R will be in the fourth normal form if b and c dependent on each
other.
However if b and c are independent of each other then R is not in fourth
normal form and the relation has to be projected to following two non-
loss projections. These non-loss projections will be in fourth normal form.
58.
Many-to-many relationships
Fourth NormalForm applies to situations
involving many-to-many relationships.
In relational databases, many-to-many
relationships are expressed through cross-
reference tables.
4th Normal Form Cont…
59.
Note about FDsand MVDs
• Every Functional Dependency is a MVD
(if A1A2…An B1B2…Bn , then A1A2…An B1B2…Bn )
• FDs rule out certain tuples (i.e. if A B then two
tuples will not have the same value for A and
different values for B)
• MVDs do not rule out tuples. They guarantee
that certain tuples must exist.
60.
Formal Definitions
• FourthNormal Form
- if R is valid BCNF and…
- given the “non-trivial” MVD: A1A2…An B1B2…Bn
{A1A2…An} is a superkey
• A MVD: A1A2…An B1B2…Bn for a Relation R is “non-trivial” if:
1. none of the Bs are among the As
2. Not all of the attributes of R are among the As and Bs
• A MVD is “trivial” if it contains all the variations of A1A2…An x
B1B2…Bn.
• A relation cannot be decomposed any further (under 4NF
rules) if it has a trivial MVD
61.
Consider a caseof class enrollment. Each student
can be enrolled in one or more classes and each
class can contain one or more students.
Clearly, there is a many-to-many relationship
between classes and students. This relationship
can be represented by a Student/Class cross-
reference table:
{StudentID, ClassID}
Example 1
62.
Example 1 Cont…
•The key for this table is the combination of
StudentID and ClassID. To avoid violation of 2NF,
all other information about each student and each
class is stored in separate Student and Class tables,
respectively.
• Note that each StudentID determines not a unique
ClassID, but a well-defined, finite set of values. This
kind of behavior is referred to as multi-valued
dependency of ClassID on StudentID.
63.
• Consider anotherexample with two many-to-many relationships, between students
and classes and between classes and teachers.
Example 2
Students Classes
* *
Also, a many-to-many relationship between
Also, a many-to-many relationship between
students and teachers is implied.
students and teachers is implied.
Classes Teachers
* *
64.
• However, thebusiness rules do not constrain this
relationship in any way—the combination of StudentID and
TeacherID does not contain any additional information
beyond the information implied by the student/class and
class/teacher relationships.
• Consequentially, the student/class and class/teacher
relationships are independent of each other—these
relationships have no additional constraints. The following
table is, then, in violation of 4NF:
{StudentID, ClassID, TeacherID}
Example 2 Cont…
65.
• FIFTH NORMALFORM :
• which is also called project-join normal form.
• A relation schema R is in 5NF with respect to a
set F of functional, multivalued and join
dependencies if, for every nontrivial join
dependency JD(R1,R2,…..Rn) in F+ , every Ri is a
superkey of R.