Introduction to database

By: Engineer Muhammad
Suleman Memon
M.E(Information Technology)
B.E(Computer System)

A database is a simple, yet flexible and
powerful tool for storing and retrieving data.
Every company, every website, has lots of data.
The more of your data that you keep in your
database - the better.
Far from being a tool only useful to big
businesses, even if you just want a simple guest
book or page hit counter, a database is perfect.
Whichever database you use - it'll be a
relational database.

 This is the industry standard design these
days.
 Relational databases use the principles of
set theory.
 Set theory is a field of mathematics that
describes how to deal with sets of data.
 Relational databases are quite intuitive and
easy to understand.

 All data is held in tables.
 A table has columns (along the top) and rows.
 You create the tables you need. You define
the table names.
 You define what the column names are in
each table.
 You define what type of data the columns
are...

 There are a number of different data types
available which represent the different types
of data you find in real life.
 There are analogous types in all databases
and programming languages. Each has
variations, but they're all fundamentally the
same.

They are:
• Numerical Types. i.e. Numbers. There are
fundamentally two types: integer and float.
Integers are whole numbers (i.e. 1, 2, 100,
999999). Floats are numbers with decimal
places (i.e. (1.1, 22.5, 3.1415927).
• String Types. i.e. Text. There are two types
here: Fixed length, and variable length. 'char'
is the only fixed length type in MySQL - from
1-255 characters.
• 'varchar' is a variable length field that can be
1-255 characters. There are several
• 'text' types of varying lengths in MySQL.

 Date and Time Types For storing dates &
times.
 Binary Data This is arbitrary data, could be
images, programs absolutely anything.

 All Relational Databases use indexes.
 Similar to the index in a book, indexes provide
a quick way to find the exact data item you
want.
 Imagine you have a database of 100,000
customers, and you want to find just one.
 If you just
 read the 'customers' table from start to finish
until you find the one your searching for, you
 could end up having to read all 100,000
records.

 This would be very slow.
 Most relational databases use a b-tree
index structure.
 This is a clever algorithm that guarantees
that you can find a data item by reading at
most 3 rows from the index.
 Databases commonly have
 millions of rows - so you can see the
necessity for indexes!

 Indexes are a large part of databases and
their design.
 Defining a column as the primary key
implicitly creates an index.
 f you have a primary key on a table - it has
an index.
 You can add a number of indexes to each
table you have.

 You'd use the create index command -
more later...
 Indexes are used automatically by the
database itself when you issue a query (ask
for data).
 It uses the index to find the data in the
table .
 For example, we want to get a customer's
 details from the example 'customers' table
above...

 If we submit the following SQL query, the
database will use the index it created for
primary key column 'customer_id', and get
everything for customer 1:
select * from customers where
customer_id = 1;
 The database uses the index because it can
use it.
 The query contains the 'customer_id' so it
 can look in the index and find the location
of customer '1'.

 If there's no index on the column in the
query, the database will have to go through
the whole table! This is called a full table
scan .

 These days, when you talk about databases
in the wild, you are primarily talking about
two types: analytical databases and
operational databases.
Analytic Databases
 Analytic databases (a.k.a. OLAP- On Line
Analytical Processing) are primarily static,
read-only databases which store archived,
historical data used for analysis.

 For example, a company might store sales
records over the last ten years in an analytic
database and use that database to analyze
marketing strategies in relationship to
demographics.
 On the web, you will often see analytic
databases in the form of inventory catalogs
such as the one shown previously from
Amazon.com.
 An inventory catalog analytical database
usually holds descriptive information about all
available products in the inventory.

 Web pages are generated dynamically by
querying the list of available products in the
inventory against some search parameters.
 The dynamically-generated page will
display the information about each item
(such as title, author, ISBN) which is stored
in the database.

 Operational databases (a.k.a. OLTP On Line
Transaction Processing), on the other hand,
are used to manage more dynamic bits of
data.
 These types of databases allow you to do
more than simply view archived data.
 Operational databases allow you to modify
that data (add, change or delete data).
 These types of databases are usually used
to track real-time information.

 For example, a company might have an
operational database used to track
warehouse/stock quantities.
 As customers order products from an online
web store, an operational database can be
used to keep track of how many items have
been sold and when the company will need
to reorder stock

 Besides differentiating databases according
to function, databases can also be
differentiated according to how they model
the data.
What is a data model?
 Well, essentially a data model is a
"description" of both a container for data
and a methodology for storing and
retrieving data from that container.
 Actually, there isn't really a data model
"thing".

 Data models are abstractions, oftentimes
mathematical algorithms and concepts.
 You cannot really touch a data model.
 But nevertheless, they are very useful.
 The analysis and design of data models has
been the cornerstone of the evolution of
databases.
 As models have advanced so has database
efficiency.
 Before the 1980's, the two most commonly
used Database Models were the hierarchical
and network systems.

 As its name implies, the Hierarchical Database
Model defines hierarchically-arranged data.
 Perhaps the most intuitive way to visualize this
type of relationship is by visualizing an upside
down tree of data.
 In this tree, a single table acts as the "root" of
the database from which other tables "branch"
out.
 You will be instantly familiar with this
relationship because that is how all windows-
based directory management systems (like
Windows Explorer) work these days.

 Relationships in such a system are thought
of in terms of children and parents such
that a child may only have one parent but a
parent can have multiple children.
 Parents and children are tied together by
links called "pointers" (perhaps physical
addresses inside the file system).
 A parent will have a list of pointers to each
of their children.

 This child/parent rule assures that data is
systematically accessible.
 To get to a low-level table, you start at the root
and work your way down through the tree until
you reach your target.
 Of course, as you might imagine, one problem
with this system is that the user must know
how the tree is structured in order to find
anything!
 The hierarchical model however, is much more
efficient than the flat-file model we discussed
earlier because there is not as much need for
redundant data.

 If a change in the data is necessary, the
change might only need to be processed
once. Consider the student flatfile database
example from our discussion of what
databases are:

Examples of hierarchical data represented
as relational tables
 An organization could store employee
information in a table that contains
attributes/columns such as employee
number, first name, last name, and
Department number.
 The organization provides each employee
with computer hardware as needed, but
computer equipment may only be used by
the employee to which it is assigned.
 The organization could store the computer
hardware information in a separate table
that includes each part's serial number,
type, and the employee that uses it.

 In many ways, the Network Database model
was designed to solve some of the more
serious problems with the Hierarchical
Database Model.
 Specifically, the Network model solves the
problem of data redundancy by representing
relationships in terms of sets rather than
hierarchy.
 The model had its origins in the Conference on
Data Systems Languages (CODASYL) which had
created the Data Base Task Group to explore
and design a method to replace the hierarchical
model.

 The network model is very similar to the
hierarchical model actually.
 In fact, the hierarchical model is a subset of
the network model.
 However, instead of using a single-parent
tree hierarchy, the network model uses set
theory to provide a tree-like hierarchy with
the exception that child tables were allowed
to have more than one parent.
 his allowed the network model to support
many-to-many relationships.

 Visually, a Network Database looks like a
hierarchical Database in that you can see it
as a type of tree.
 However, in the case of a Network
Database, the look is more like several trees
which share branches.
 Thus, children can have multiple parents
and parents can have multiple children.

 (RDBMS - relational database management
system) A database based on the relational
model developed by E.F. Codd.
 A relational database allows the definition
of data structures, storage and retrieval
operations and integrity constraints.
 In such a database the data and relations
between them are organised in tables. A
table is a collection of records and each
record in a table contains the same fields.

Properties of Relational Tables:
 Values Are Atomic
 Each Row is Unique
 Column Values Are of the Same Kind
 The Sequence of Columns is Insignificant
 The Sequence of Rows is Insignificant
 Each Column Has a Unique Name

 Certain fields may be designated as keys,
which means that searches for specific
values of that field will use indexing to
speed them up.
 Where fields in two different tables take
values from the same set, a join operation
can be performed to select related records
in the two tables by matching values in
those fields.
 Often, but not always, the fields will have
the same name in both tables.

 For example, an "orders" table might
contain (customer-ID, product-code) pairs
and a "products" table might contain
(product-code, price) pairs so to calculate a
given customer's bill you would sum the
prices of all products ordered by that
customer by joining on the product-code
fields of the two tables.
 This can be extended to joining multiple
tables on multiple fields.

 Because these relationships are only
specified at retreival time, relational
databases are classed as dynamic database
management system.
 The RELATIONAL database model is based
on the Relational Algebra.

 Object/relational database management
systems (ORDBMSs) add new object storage
capabilities to the relational systems at the
core of modern information systems.
 These new facilities integrate management
of traditional fielded data, complex objects
such as time-series and geospatial data and
diverse binary media such as audio, video,
images, and applets.

 By encapsulating methods with data
structures, an ORDBMS server can execute
comple x analytical and data manipulation
operations to search and transform
multimedia and other complex objects.
 As an evolutionary technology, the
object/relational (OR) approach has
inherited the robust transaction- and
performance-management features of it s
relational ancestor and the flexibility of its
object-oriented cousin.

 database designers can work with familiar
tabular structures and data definition
languages (DDLs) while assimilating new
object-management possibilities.
 Query and procedural languages and call
interfaces in ORDBMSs are familiar: SQL3,
vendor procedural languages, and ODBC,
JDBC, and proprie tary call interfaces are all
extensions of RDBMS languages and
interfaces.

 And the leading vendors are, of course,
quite well known: IBM, Inform ix, and
Oracle.

 Object DBMSs add database functionality to
object programming languages.
 They bring much more than persistent
storage of programming language objects.
 Object DBMSs extend the semantics of the
C++, Smalltalk and Java object
programming languages to provide full-
featured database programming capability,
while retaining native language
compatibility.

 A major benefit of this approach is the
unification of the application and database
development into a seamless data model
and language environment.
 As a result, applications require less code,
use more natural data modeling, and code
bases are easier to maintain.
 Object developers can write complete
database applications with a modest
amount of additional effort.

 According to Rao (1994), "The object-
oriented database (OODB) paradigm is the
combination of object-oriented
programming language (OOPL) systems and
persistent systems.
 The power of the OODB comes from the
seamless treatment of both persistent data,
as found in databases, and transient data,
as found in executing programs."

 In contrast to a relational DBMS where a
complex data structure must be flattened
out to fit into tables or joined together from
those tables to form the in-memory
structure, object DBMSs have no
performance overhead to store or retrieve a
web or hierarchy of interrelated objects.
 This one-to-one mapping of object
programming language objects to database
objects has two benefits over other storage
approaches:

 It provides higher performance management of
objects, and it enables better management of
the complex interrelationships between
objects.
 This makes object DBMSs better suited to
support applications such as financial portfolio
risk analysis systems, telecommunications
service applications, world wide web document
structures, design and manufacturing systems,
and hospital patient record systems, which
have complex relationships between data.

 In semistructured data model, the information
that is normally associated with a schema is
contained within the data, which is sometimes
called ``self-describing''.
 In such database there is no clear separation
between the data and the schema, and the
degree to which it is structured depends on the
application.
 In some forms of semistructured data there is
no separate schema, in others it exists but only
places loose constraints on the data.

 Semi-structured data is naturally modelled in
terms of graphs which contain labels which
give semantics to its underlying structure.
 Such databases subsume the modelling
power of recent extensions of flat relational
databases, to nested databases which allow
the nesting (or encapsulation) of entities, and
to object databases which, in addition, allow
cyclic references between objects.

 The associative model divides the real-world
things about which data is to be recorded into
two sorts:
 Entities are things that have discrete,
independent existence.
 An entity’s existence does not depend on any
other thing.
 Associations are things whose existence
depends on one or more other things, such
that if any of those things ceases to exist, then
the thing itself ceases to exist or becomes
meaningless.

An associative database comprises two data
structures:
1. A set of items, each of which has a unique
identifier, a name and a type.
2. A set of links, each of which has a unique
identifier, together with the unique identifiers
of three other things, that represent the
source source, verb and target of a fact that is
recorded about the source in the database.
Each of the three things identified by the
source, verb and target may be either a link or
an item.

 The best way to understand the rationale of
EAV design is to understand row modeling (of
which EAV is a generalized form).
 Consider a supermarket database that must
manage thousands of products and brands,
many of which have a transitory existence.
 Here, it is intuitively obvious that product
names should not be hard-coded as names of
columns in tables. Instead, one stores product
descriptions in a Products table:
purchases/sales of individual items are
recorded in other tables as separate rows with
a product ID referencing this table.

 Conceptually an EAV design involves a
single table with three columns, an entity
(such as an olfactory receptor ID), an
attribute (such as species, which is actually
a pointer into the metadata table) and a
value for the attribute (e.g., rat). In EAV
design, one row stores a single fact.
 In a conventional table that has one column
per attribute, by contrast, one row stores a
set of facts. EAV design is appropriate when
the number of parameters that potentially
apply to an entity is vastly more than those
that actually apply to an individual entity.

 The context data model combines features of
all the above models.
 It can be considered as a collection of object-
oriented, network and semistructured models
or as some kind of object database.
 In other words this is a flexible model, you can
use any type of database structure depending
on task. Such data model has been
implemented in DBMS ConteXt.
 The fundamental unit of information storage of
ConteXt is a CLASS.

 Class contains METHODS and describes
OBJECT.
 The Object contains FIELDS and PROPERTY. The
field may be composite, in this case the field
contains SubFields etc.
 The property is a set of fields that belongs to
particular Object. (similar to AVL database). In
other words, fields are permanent part of
Object but Property is its variable part.
 The header of Class contains the definition of
the internal structure of the Object, which
includes the description of each field, such as
their type, length, attributes and name.

 Context data model has a set of predefined
types as well as user defined types.
 The predefined types include not only
character strings, texts and digits but also
pointers (references) and aggregate types
(structures).
 A context model comprises three main data
types: REGULAR, VIRTUAL and REFERENCE.

 Database design is the process of
producing a detailed data model of a
database.
 This logical data model contains all the
needed logical and physical design choices
and physical storage parameters needed to
generate a design in a Data Definition
Language, which can then be used to create
a database.
 A fully attributed data model contains
detailed attributes for each entity.

 The term database design can be used to
describe many different parts of the design of
an overall database system.
 Principally, and most correctly, it can be
thought of as the logical design of the base
data structures used to store the data.
 In the relational model these are the tables
and views.

Conceptual schema:
 A conceptual schema or conceptual data model
is a map of concepts and their relationships.
 This describes the semantics of an
organization and represents a series of
assertions about its nature.
 Specifically, it describes the things of
significance to an organization (entity classes),
about which it is inclined to collect information,
and characteristics of (attributes) and
associations between pairs of those things of
significance (relationships).

 Because a conceptual schema represents
the semantics of an organization, and not a
database design, it may exist on various
levels of abstraction.
 Conceptual data models take a more
abstract perspective, identifying the
fundamental things, of which the things an
individual deals with are just examples.
 The model does allow for what is called
inheritance in object oriented terms.

 A data structure diagram (DSD) is a data
model or diagram used to describe
conceptual data models by providing
graphical notations which document entities
and their relationships, and the constraints
that binds them.

 Once the relationships and dependencies
amongst the various pieces of information have
been determined, it is possible to arrange the
data into a logical structure which can then be
mapped into the storage objects supported by
the database management system.
 Ensuring, via normalisation procedures and the
definition of integrity rules, that the stored
database will be non-redundant and properly
connected.
 logical data structuring) is based on the
identification of: the entities, their attributes,
and the relationships between the entities.

Entity:
 Something about which an enterprise needs
to keep data.
Attributes:
 The properties of an entity.
Relationships
 The connections between entities.

 An Entity may be physical
Example:
an Employee; a Part; a Machine
 Or conceptual
Example:
a Project; an Order; a Course.
 Each instance of an entity is different from
all others - one or more attributes will
typically form a 'primary key' attribute -
unique to a particular instance.

 Attributes are the properties of an entity .
 Data which describes or is 'owned' by an
entity.
Attributes (data) equate to facts - specific
details about entities - details of interest.

 In the real world, objects do not exist in
isolation.
 Our understanding of real world objects is in
terms of their relationships with other objects;
for example, 'the earth circles the sun'; 'he is a
carpenter' ; etc.
 Any real world object which we are going to
include in a data model as an entity type must
have some relationship with at least one other
entity within the model (even if we are not
going to implement that relationship within our
database system).

One-to-one:
 Both tables can have only one record on
either side of the relationship.
 Each primary key value relates to only one
(or no) record in the related table.
 Most one-to-one relationships are forced
by business rules and don't flow naturally
from the data.
 In the absence of such a rule, you can
usually combine both tables into one table
without breaking any normalization rules.

One-to-One Relationships
Contd:
For example: a Factory may
have many Managers
during its lifetime; a Manager
might be in charge of
different Factories during his career.

One-to-many:
 The primary key table contains only one
record that relates to none, one, or many
records in the related table.
 This relationship is similar to the one
between you and a parent.
 You have only one mother, but your mother
may have several children.

One-to-many Contd:
A formal description:
of the relationship shown in the diagram
above is:
 One Factory may make zero or more
Components.
 One Component is made in one (and only
one) Factory.

One-to-one: Contd:
What this means in a database system is that:
 one record in a table called Factory may be
related to a number of records in a
Component table;
but
 a record in the Component table can only
be related to one record in the Factory
table.

One-to-Many Relationships summarised:
For any occurrence of A, there may
be 0, 1, or many, occurrences of B.
For any occurrence of B, there can
only be one occurrence of A.
From another perspective:
 If an 'A' record exists there may be zero or
more related 'B' records.
Any 'B' record can only be related to a single
'A' record.

Many-to-many:
 Each record in both tables can relate to any
number of records (or no records) in the
other table.
 For instance, if you have several siblings, so
do your siblings (have many siblings).
 Many-to-many relationships require a third
table, known as an associate or linking
table, because relational systems can't
directly accommodate the relationship.

Many-to-many: Contd:

 Minimally, a many-many relationship will
require insertion of a 'link entity'.
 Further analysis may show that the link
entity has attributes of its own - often
qualifiers in respect of quantity or time.

 The physical design of the database specifies
the physical configuration of the database on
the storage media.
 This includes detailed specification of data
elements, data types, indexing options and
other parameters residing in the DBMS data
dictionary.
 It is the detailed design of a system that
includes modules & the database's hardware &
software specifications of the system.
 In the case of relational databases the storage
objects are tables which store data in rows and
columns.

• The purpose of normailization
• Data redundancy and Update
Anomalies
• Functional Dependencies
• The Process of Normalization
• First Normal Form (1NF)
• Second Normal Form (2NF)
• Third Normal Form (3NF)

Normalization is a technique for producing a
set of relations with desirable properties, given
the data requirements of an enterprise.

The process of normalization is a formal method
that identifies relations based on their primary or
candidate keys and the functional dependencies
among their attributes.

Relations that have redundant data may have
problems called update anomalies, which are
classified as ,
Insertion anomalies
Deletion anomalies
Modification anomalies

To insert a new staff with branchNo B007 into the
StaffBranch relation;
To delete a tuple that represents the last member of staff
located at a branch B007;
To change the address of branch B003.
StaffBranch
staffNo sName position salary branchNo bAddress
SL21 John White Manager 30000 B005 22 Deer Rd, London
SG37 Ann Beech Assistant 12000 B003 163 Main St,Glasgow
SG14 David Ford Supervisor 18000 B003 163 Main St,Glasgow
SA9 Mary Howe Assistant 9000 B007 16 Argyll St, Aberdeen
SG5 Susan Brand Manager 24000 B003 163 Main St,Glasgow
SL41 Julie Lee Assistant 9000 B005 22 Deer Rd, London

Figure 1 StraffBranch relation

Staff
staffNo sName position salary branceNo

SL21 John White Manager 30000 B005
SG37 Ann Beech Assistant 12000 B003
SG14 David Ford Supervisor 18000 B003
SA9 Mary Howe Assistant 9000 B007
SG5 Susan Brand Manager 24000 B003
SL41 Julie Lee Assistant 9000 B005

Branch
branceNo bAddress
B005 22 Deer Rd, London
B007 16 Argyll St, Aberdeen
B003 163 Main St,Glasgow

Figure 2 Straff and Branch relations

Functional dependency describes the relationship between
attributes in a relation.
For example, if A and B are attributes of relation R, and B is
functionally dependent on A ( denoted A B), if each value of
A is associated with exactly one value of B. ( A and B may each
consist of one or more attributes.)

B is functionally
A B
dependent on A

Determinant Refers to the attribute or group of attributes on
the left-hand side of the arrow of a functional
dependency

Trival functional dependency means that the right-hand
side is a subset ( not necessarily a proper subset) of the left-
hand side.
For example: (See Figure 1)
staffNo, sName  sName
staffNo, sName  staffNo

They do not provide any additional information about possible integrity
constraints on the values held by these attributes.

We are normally more interested in nontrivial dependencies because they
represent integrity constraints for the relation.

Main characteristics of functional dependencies in normalization

• Have a one-to-one relationship between attribute(s) on
the left- and right- hand side of a dependency;

• hold for all time;

• are nontrivial.

Identifying the primary key
Functional dependency is a property of the meaning or
semantics of the attributes in a relation. When a
functional
dependency is present, the dependency is specified as a
constraint between the attributes.
An important integrity constraint to consider first is the
identification of candidate keys, one of which is
selected to
be the primary key for the relation using functional
dependency.

Inference Rules
A set of all functional dependencies that are implied by a given
set of functional dependencies X is called closure of X, written
X+. A set of inference rule is needed to compute X+ from X.

Armstrong’s axioms

1. Relfexivity: If B is a subset of A, them A  B
2. Augmentation: If A  B, then A, C  B
3. Transitivity: If A  B and B  C, then A C
4. Self-determination: AA
5. Decomposition: If A  B,C then A  B and A C
6. Union: If A  B and A  C, then A B,C
7. Composition: If A  B and C  D, then A,C B,

Minial Sets of Functional Dependencies
A set of functional dependencies X is minimal if it satisfies
the following condition:
• Every dependency in X has a single attribute on its
right-hand side

• We cannot replace any dependency A  B in X with
dependency C B, where C is a proper subset of A, and
still have a set of dependencies that is equivalent to X.

• We cannot remove any dependency from X and still have a
set of dependencies that is equivalent to X.

Example of A Minial Sets of Functional
Dependencies
A set of functional dependencies for the StaffBranch relation
satisfies the three conditions for producing a minimal set.

staffNo  sName
staffNo  position
staffNo  salary
staffNo  branchNo
staffNo  bAddress
branchNo  bAddress
branchNo, position  salary
bAddress, position  salary

• Multivalued Attributes (or repeating groups):
non-key attributes or groups of non-key
attributes the values of which are not uniquely
identified by (directly or indirectly) (not
functionally dependent on) the value of the
Primary Key (or its part).
STUDENT

Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00

• Partial Dependency – when an non-key
attribute is determined by a part, but not
the whole, of a COMPOSITE primary key.
Partial
Dependency
CUSTOMER

Cust_ID Name Order_ID
101 AT&T 1234
101 AT&T 156
125 Cisco 1250

• Transitive Dependency – when a non-
key attribute determines another non-
key attribute. Transitive
Dependency

EMPLOYEE

Emp_ID F_Name L_Name Dept_ID Dept_Name
111 Mary Jones 1 Acct
122 Sarah Smith 2 Mktg

• Normalization is often executed as a series of steps.
Each step corresponds to a specific normal form that has
known properties.

• As normalization proceeds, the relations become
progressively more restricted in format, and also less
vulnerable to update anomalies.

• For the relational data model, it is important to recognize
thatit is only first normal form (1NF) that is critical in
creating relations. All the subsequent normal forms are
optional.

• Unnormalized – There are
multivalued attributes or repeating
groups
• 1 NF – No multivalued attributes or
repeating groups.
• 2 NF – 1 NF plus no partial
dependencies
• 3 NF – 2 NF plus no transitive
dependencies

All attributes are directly
• ISBN  Title or indirectly determined
• ISBN  Publisher by the primary key;
therefore, the relation is
• Publisher  Address at least in 1 NF

BOOK

ISBN Title Publisher Address

• ISBN  Title The relation is at least in 1NF.
• ISBN  Publisher There is no COMPOSITE
primary key, therefore there
• Publisher  Address can’t be partial dependencies.
Therefore, the relation is at
least in 2NF

BOOK


Publisher is a non-key attribute,
and it determines Address,
• ISBN  Title another non-key attribute.
Therefore, there is a transitive
• ISBN  Publisher dependency, which means that
• Publisher  Address the relation is NOT in 3 NF.

BOOK


We know that the relation is at
• ISBN  Title least in 2NF, and it is not in 3
• ISBN  Publisher NF. Therefore, we conclude
• Publisher  Address
that the relation is in 2NF.

BOOK


• Option 2: Remove the entire repeating group
from the relation. Create another relation which
would contain all the attributes of the repeating
group, plus the primary key from the first
relation. In this new relation, the primary key
from the original relation and the determinant
of the repeating group will comprise a primary
key.
STUDENT

101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00

STUDENT

Stud_ID Name
101 Lennon
125 Jonson

STUDENT_COURSE

Stud_ID Course Units
101 MSI 250 3
101 MSI 415 3
125 MSI 331 3

Composite
Primary Key

STUDENT

101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00

• Goal: Remove Partial Dependencies
Partial
Composite Dependencies
Primary Key

STUDENT

101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00

• Remove attributes that are dependent from the
part but not the whole of the primary key from
the original relation. For each partial
dependency, create a new relation, with the
corresponding part of the primary key from the
original as the primary key.
STUDENT

101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00

CUSTOMER
STUDENT_COURSE
101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00 Stud_ID Course_ID
101 MSI 250
101 MSI 415
125 MSI 331

STUDENT COURSE

101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00

• Goal: Get rid of transitive
dependencies.
Transitive
Dependency
EMPLOYEE


• Remove the attributes, which are dependent on
a non-key attribute, from the original relation.
For each transitive dependency, create a new
relation with the non-key attribute which is a
determinant in the transitive dependency as a
primary key, and the dependent non-key
attribute as a dependent.
EMPLOYEE


EMPLOYEE


EMPLOYEE

Emp_ID F_Name L_Name Dept_ID
111 Mary Jones 1
122 Sarah Smith 2

DEPARTMENT

Dept_ID Dept_Name
1 Acct
2 Mktg

Repeating group = (propertyNo, pAddress,
rentStart, rentFinish, rent, ownerNo, oName)
Unnormalized form (UNF)
A table that contains one or more repeating groups.

ClientNo cName propertyNo pAddress rentStart rentFinish rent ownerNo oName
6 lawrence Tina
1-Jul-00 31-Aug-01 350 CO40 Murphy
PG4 St,Glasgow
John
CR76
kay Tony
PG16 5 Novar Dr, Shaw
1-Sep-02 1-Sep-02 450 CO93
Glasgow

6 lawrence Tina
PG4 1-Sep-99 10-Jun-00 350 CO40 Murphy
St,Glasgow

Tony
Aline 2 Manor Rd,
CR56 PG36 10-Oct-00 1-Dec-01 370 CO93 Shaw
Stewart Glasgow

Tony
5 Novar Dr, Shaw
PG16 1-Nov-02 1-Aug-03 450 CO93
Glasgow

Figure 3 ClientRental unnormalized table

First Normal Form is a relation in which the intersection of each
row and column contains one and only one value.
There are two approaches to removing repeating groups from
unnormalized tables:

1. Removes the repeating groups by entering appropriate
data in the empty columns of rows containing the
repeating data.

2. Removes the repeating group by placing the repeating
data, along with a copy of the original key attribute(s), in
a separate relation. A primary key is identified for the
new relation.

The ClientRental relation is defined as follows,
ClientRental first approach, we remove the repeating group
With the ( clientNo, propertyNo, cName, pAddress, rentStart,
rentFinish, rent, ownerNo, oName) entering the appropriate client
(property rented details) by
data into each row.
ClientNo propertyNo cName pAddress rentStart rentFinish rent ownerNo oName
John 6 lawrence Tina
CR76 PG4 1-Jul-00 31-Aug-01 350 CO40
Kay St,Glasgow Murphy
John 5 Novar Dr, Tony
CR76 PG16 1-Sep-02 1-Sep-02 450 CO93
Kay Glasgow Shaw
Aline 6 lawrence Tina
CR56 PG4 1-Sep-99 10-Jun-00 350 CO40
Stewart St,Glasgow Murphy
Tony
Aline 2 Manor Rd,
CR56 PG36 10-Oct-00 1-Dec-01 370 CO93 Shaw
Stewart Glasgow
Tony
Aline 5 Novar Dr,
CR56 PG16 1-Nov-02 1-Aug-03 450 CO93 Shaw
Stewart Glasgow

Figure 4 1NF ClientRental relation with the first approach

Client the second
With (clientNo, cName)
approach, we remove the repeating group
PropertyRentalOwner (clientNo, propertyNo, pAddress, rentStart,
(property rented details) by placing the repeating data along wit
rentFinish, rent, ownerNo, oName)
a copy of the original key attribute (clientNo) in a separte relatio
ClientNo cName
CR76 John Kay
CR56 Aline Stewart
ClientNo propertyNo pAddress rentStart rentFinish rent ownerNo oName
6 lawrence Tina
CR76 PG4 1-Jul-00 31-Aug-01 350 CO40
St,Glasgow Murphy
5 Novar Dr, Tony
CR76 PG16 1-Sep-02 1-Sep-02 450 CO93
Glasgow Shaw
6 lawrence Tina
CR56 PG4 1-Sep-99 10-Jun-00 350 CO40
St,Glasgow Murphy
2 Manor Rd, Tony
CR56 PG36 10-Oct-00 1-Dec-01 370 CO93
Glasgow Shaw
5 Novar Dr, Tony
CR56 PG16 1-Nov-02 1-Aug-03 450 CO93
Glasgow Shaw

Figure 5 1NF ClientRental relation with the second approach

Full functional dependency indicates that if A and B
are
attributes of a relation, B is fully functionally
dependent on A if B is functionally dependent on A,
but not on any proper subset of A.

A functional dependency AB is partially dependent if
there is some attributes that can be removed from A and
the dependency still holds.

Second normal form (2NF) is a relation that is in first
normal form and every non-primary-key attribute is
fully functionally dependent on the primary key.

The normalization of 1NF relations to 2NF involves
the
removal of partial dependencies. If a partial
dependency exists, we remove the function
dependent attributes from
the relation by placing them in a new relation along
with
a copy of their determinant.

The ClientRental relation has the following functional
dependencies:

fd1 clientNo, propertyNo  rentStart, rentFinish (Primary Key)
fd2 clientNo  cName (Partial
dependency)
fd3 propertyNo  pAddress, rent, ownerNo, oName (Partial
dependency)
fd4 ownerNo  oName (Transitive Dependency)
fd5 clientNo, rentStart  propertyNo, pAddress,
rentFinish, rent, ownerNo, oName (Candidate key)
fd6 propertyNo, rentStart  clientNo, cName, rentFinish (Candidate key)

After removing the partial dependencies, the creation of the three
Client (clientNo, cName)
new relations called Client, Rental, andrentStart, rentFinish)
Rental (clientNo, propertyNo, PropertyOwner
PropertyOwner (propertyNo, pAddress, rent, ownerNo, oName
Client Rental
ClientNo propertyNo rentStart rentFinish
ClientNo cName CR76 PG4 1-Jul-00 31-Aug-01
CR76 John Kay
CR76 PG16 1-Sep-02 1-Sep-02
CR56 Aline Stewart
CR56 PG4 1-Sep-99 10-Jun-00
CR56 PG36 10-Oct-00 1-Dec-01
CR56 PG16 1-Nov-02 1-Aug-03
Rental (clientNo, propertyNo, rentStart, rentFinish)
PropertyOwner (propertyNo, pAddress, rent, ownerNo, oName)
propertyNo pAddress rent ownerNo oName
PG4 6 lawrence St,Glasgow 350 CO40 Tina Murphy

PG16 5 Novar Dr, Glasgow 450 CO93 Tony Shaw

PG36 2 Manor Rd, Glasgow 370 CO93 Tony Shaw

Figure 6 2NF ClientRental relation

Transitive dependency
A condition where A, B, and C are attributes of a relation such th
if A  B and B  C, then C is transitively dependent on A via B
(provided that A is not functionally dependent on B or C).
Third normal form (3NF)
A relation that is in first and second normal form, and in
which
no non-primary-key attribute is transitively dependent on
the
primary key.

The normalization of 2NF relations to 3NF involves the
removal of transitive dependencies by placing the
attribute(s) in a new relation along with a copy of the
determinant.

The functional dependencies for the Client, Rental and
PropertyOwner relations are as follows:

Client
fd2 clientNo  cName
(Primary Key)

Rental
fd1 clientNo, propertyNo  rentStart, rentFinish (Primary Key)
fd5 clientNo, rentStart  propertyNo, rentFinish (Candidate
key)
fd6 propertyNo, rentStart  clientNo, rentFinish (Candidate
key)

PropertyOwner
fd3 propertyNo  pAddress, rent, ownerNo, oName
(Primary Key)
fd4 ownerNo  oName (Transitive
Dependency)

The resulting 3NF relations have the forms:

Rental (clientNo, propertyNo, rentStart, rentFinish)
PropertyOwner (propertyNo, pAddress, rent, ownerNo)
Owner (ownerNo, oName)

Client Rental

ClientNo cName ClientNo propertyNo rentStart rentFinish
CR76 John Kay CR76 PG4 1-Jul-00 31-Aug-01
CR56 Aline Stewart CR76 PG16 1-Sep-02 1-Sep-02
CR56 PG4 1-Sep-99 10-Jun-00
CR56 PG36 10-Oct-00 1-Dec-01
CR56 PG16 1-Nov-02 1-Aug-03

PropertyOwner Owner

propertyNo pAddress rent ownerNo ownerNo oName

PG4 6 lawrence St,Glasgow 350 CO40 CO40 Tina Murphy

PG16 5 Novar Dr, Glasgow 450 CO93 CO93 Tony Shaw

PG36 2 Manor Rd, Glasgow 370 CO93

Figure 7 2NF ClientRental relation

Boyce-Codd normal form (BCNF)
A relation is in BCNF, if and only if, every determinant
is a
candidate key.
The difference between 3NF and BCNF is that for a
functional
dependency A  B, 3NF allows this dependency in a
relation
if B is a primary-key attribute and A is not a candidate
key,
whereas BCNF insists that for this dependency to
remain in a
relation, A must be a candidate key.

fd1 clientNo, interviewDate  interviewTime, staffNo, roomNo (Primary
Key)
fd2 staffNo, interviewDate, interviewTime clientNo (Candidate key)
fd3 roomNo, interviewDate, interviewTime  clientNo, staffNo
(Candidate key)
fd4 staffNo, interviewDate  roomNo (not a candidate key)

As a consequece the ClientInterview relation may suffer from update anmalies.
For example, two tuples have to be updated if the roomNo need be changed for
staffNo SG5 on the 13-May-02.
ClientInterview
ClientNo interviewDate interviewTime staffNo roomNo
CR76 13-May-02 10.30 SG5 G101
CR76 13-May-02 12.00 SG5 G101
CR74 13-May-02 12.00 SG37 G102
CR56 1-Jul-02 10.30 SG5 G102

Figure 8 ClientInterview relation

To transform the ClientInterview relation to BCNF, we must remove the violating
functional dependency by creating two new relations called Interview and SatffRoom
as shown below,

Interview (clientNo, interviewDate, interviewTime, staffNo)
StaffRoom(staffNo, interviewDate, roomNo)

Interview
ClientNo interviewDate interviewTime staffNo
CR76 13-May-02 10.30 SG5
CR76 13-May-02 12.00 SG5
CR74 13-May-02 12.00 SG37
CR56 1-Jul-02 10.30 SG5

StaffRoom
staffNo interviewDate roomNo
SG5 13-May-02 G101
SG37 13-May-02 G102
SG5 1-Jul-02 G102

Figure 9 BCNF Interview and StaffRoom relations

Multi-valued dependency (MVD)
represents a dependency between attributes (for example, A,
B and C) in a relation, such that for each value of A there is a
set of values for B and a set of value for C. However, the set of
values for B and C are independent of each other.
A multi-valued dependency can be further defined as
being
trivial or nontrivial. A MVD A > B in relation R is
defined as being trivial if
• B is a subset of A
or
•AU B= R
A MVD is defined as being nontrivial if neither of the above
two conditions is satisfied.

Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and
contains
no nontrivial multi-valued dependencies.

Fifth normal form (5NF)
A relation that has no join dependency.
Lossless-join dependency
A property of decomposition, which ensures that no spurious
tuples are generated when relations are reunited through a
natural join operation.

Join dependency
Describes a type of dependency. For example, for a relation R
with subsets of the attributes of R denoted as A, B, …, Z, a
relation R satisfies a join dependency if, and only if, every legal
value of R is equal to the join of its projections on A, B, …, Z.

 Atomicity requires that database
modifications must follow an "all or nothing"
rule.
 Each transaction is said to be atomic. If one
part of the transaction fails, the entire
transaction fails and the database state is left
unchanged.
 To be compliant with the 'A', a system must
guarantee the atomicity in each and every
situation, including power failures / errors /
crashes.
 This guarantees that 'an incomplete
transaction' cannot exist.

 The consistency property ensures that any
transaction the database performs will take it
from one consistent state to another.
 Consistency states that only consistent (valid
according to all the rules defined) data will be
written to the database.
 Quite simply, whatever rows will be affected
by the transaction will remain consistent with
each and every rule that is applied to them
(including but not limited to: constraints,
cascades, triggers).

 While this is extremely simple and clear, it's
worth noting that this consistency
requirement applies to everything changed by
the transaction, without any limit (including
triggers firing other triggers launching
cascades that eventually fire other triggers
etc.) at all.

 Isolation refers to the requirement that no
transaction should be able to interfere with
another transaction
 In other words, it should not be possible that
two transactions that affect the same rows
run concurrently, as the outcome would be
unpredicted and the system thus made
unreliable at all.

 In effect the only strict way to respect the
isolation property is to use a serial model
where no two transactions can occur on the
same data at the same time and where the
result is predictable (i.e. transaction B will
happen after transaction A in every single
possible case).

 Durability means that once a transaction has
been committed, it will remain so.
 In other words, every committed transaction
is protected against power loss/crash/errors
and cannot be lost by the system and can
thus be guaranteed to be completed.
 In a relational database, for instance, once a
group of SQL statements execute, the results
need to be stored permanently.
 If the database crashes right after a group of
SQL statements execute, it should be possible
to restore the database state to the point
after the last transaction committed.

 The transaction subtracts 10 from A and adds
10 to B.
 If it succeeds, it would be valid, because the
data continues to satisfy the constraint.
 However, assume that after removing 10 from
A, the transaction is unable to modify B.
 If the database retains A's new value,
atomicity and the constraint would both be
violated. Atomicity requires that both parts of
this transaction complete or neither.

 Consistency is a very general term that
demands the data meets all validation rules.
 Also, it may be implied that both A and B
must be integers.
 A valid range for A and B may also be
implied. All validation rules must be checked
to ensure consistency.
 Assume that a transaction attempts to
subtract 10 from A without altering B.
 Because consistency is checked after each
transaction, it is known that A + B = 100
before the transaction begins.

 If the transaction removes 10 from A
successfully, atomicity will be achieved.
 However, a validation check will show that A
+ B = 90.
 That is not consistent according to the rules
of the database.
 The entire transaction must be cancelled and
the affected rows rolled back to their pre-
transaction state.

Introduction to database

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to database

Similar to Introduction to database (20)

Recently uploaded

Recently uploaded (20)

Introduction to database