1. Introduction to DBMS
By Dr. Kamal Gulati
For more Notes check at www.mybigdataanalytics.in
Table of Contents
1. INTRODUCTION....................................................................................................... 4
1.1. DBMS Definitions ................................................................................................... 6
1.1.1. Database........................................................................................................ 6
1.1.2. DBMS ........................................................................................................... 6
1.1.3. Database system............................................................................................ 6
1.2. Components of database .......................................................................................... 6
1.2.1. Database administrator (DBA) ..................................................................... 6
1.2.2. Database designer ......................................................................................... 6
1.2.3. End users....................................................................................................... 6
1.3. Advantages of DBMS.............................................................................................. 7
1.4. Disadvantage in File Processing System ................................................................. 7
2. DATA MODELS........................................................................................................ 8
2.1. Categories of data models........................................................................................ 8
2.2. Schemas and instances............................................................................................. 8
2.3. DBMS architecture .................................................................................................. 9
2.4. Data independence................................................................................................. 10
2.4.1. Logical data independence.......................................................................... 10
2.4.2. Physical data independence ........................................................................ 10
2.5. Classification of database management system..................................................... 10
2.5.1. Relational data model ................................................................................. 10
2.5.2. Network data model.................................................................................... 10
2.5.3. Hierarchal data model................................................................................. 11
2.5.4. Object oriented data model......................................................................... 11
2.6. Database languages and interfaces......................................................................... 11
2.6.1. DBMS languages ........................................................................................ 11
2.6.2. DBMS interfaces......................................................................................... 12
2.7. Database system environment................................................................................ 13
2.7.1. Data manager: ............................................................................................. 13
2.7.2. DDL compiler............................................................................................. 13
2.7.3. Run-time database processor ...................................................................... 14
2.7.4. Query compiler ........................................................................................... 14
2.7.5. Pre-compiler................................................................................................ 14
2.8. Entity Relationship Model..................................................................................... 14
2.8.1. Entities and attributes.................................................................................. 14
2. 2.8.2. Entity types, entity sets, keys and values sets............................................. 15
2.8.3. Relationship types, sets and instances ........................................................ 16
2.8.4. Notations for ER diagram........................................................................... 18
2.8.5. Generalization............................................................................................. 19
2.8.6. Aggregation................................................................................................. 21
3. RELETIONAL MODEL .......................................................................................... 21
3.1. Characteristics of relation ...................................................................................... 22
3.2. Operations of the relation model............................................................................ 22
3.3. Relational algebra operation .................................................................................. 23
3.4. Set theoretic operation ........................................................................................... 23
3.4.1. Union........................................................................................................... 23
3.4.2. Intersection.................................................................................................. 24
3.4.3. Set difference .............................................................................................. 24
3.4.4. Join operation.............................................................................................. 24
3.4.5. Division operation....................................................................................... 25
3.4.6. Aggregate function...................................................................................... 25
3.4.7. COUNT....................................................................................................... 25
3.4.8. Grouping ..................................................................................................... 25
3.4.9. Recursive closure operation........................................................................ 25
3.4.10. Outer join .................................................................................................... 26
3.5. Tuple relational calculus........................................................................................ 26
3.5.1. Expression and formulas in tuples calculus................................................ 27
3.5.2. Existence and universal quantifier.............................................................. 28
3.5.3. Rules for the definition of a formula........................................................... 28
3.6. Transforming the universal and existential quantifier ........................................... 29
3.6.1. Domain relational calculus ......................................................................... 29
4. Database Design........................................................................................................ 30
4.1. Schema Refinement ............................................................................................... 30
4.1.1. Guidelines for relation schema ................................................................... 30
4.2. Functional Dependencies....................................................................................... 31
4.2.1. Interference rules for Functional Dependencies ......................................... 32
4.2.2. Axioms to check if FD holds ...................................................................... 32
4.2.3. An Algorithm to Compute Attribute Closure X+ with respect to F ........... 33
4.3. NORMALIZATION.............................................................................................. 33
4.3.1. Basics of normal forms ............................................................................... 33
4.4. Inclusive dependency............................................................................................. 41
5. TRANSACTION MANAGEMENT ........................................................................ 42
5.1. Transaction Concept .............................................................................................. 42
5.2. Transaction state .................................................................................................... 43
5.3. Implementation of atomicity and durability .......................................................... 44
5.4. Concurrent Execution ............................................................................................ 44
5.5. Schedule................................................................................................................. 45
5.6. Serializability......................................................................................................... 46
5.6.1. Conflict Serializability................................................................................ 46
5.6.2. View Serializability .................................................................................... 48
5.7. Recoverability........................................................................................................ 48
3. 5.7.1. Recoverable schedule.................................................................................. 48
5.7.2. Cascade less schedule ................................................................................. 49
5.8. Testing for Serializability ...................................................................................... 49
5.9. Precedence graph ................................................................................................... 50
6. Concurrency control.................................................................................................. 51
6.1. Lock based protocols ............................................................................................. 51
6.1.1. Locks........................................................................................................... 51
6.1.2. Granting of locks......................................................................................... 54
6.1.3. Avoiding starvation of transaction by granting locks................................. 54
6.2. Two phase locking protocol................................................................................... 54
6.3. Graph based protocol............................................................................................. 55
6.4. Time-stamp based protocol.................................................................................... 55
6.5. Validation based protocol ...................................................................................... 56
6.6. Recovery system .................................................................................................... 56
6.6.1. Failure Classification .................................................................................. 56
6.6.2. Log based recovery:.................................................................................... 57
6.7. Deferred Database Modification............................................................................ 58
6.8. Immediate Database Modification......................................................................... 58
7. Centralized and Distributed Database....................................................................... 59
7.1. Distributed Database System................................................................................. 59
7.2. Some advantages of the DDBMS are as follows:.................................................. 59
7.3. Some additional properties: ................................................................................... 60
7.4. Physical hardware level ......................................................................................... 60
7.5. Client Server Architecture ..................................................................................... 61
7.6. Data fragmentation................................................................................................. 62
7.6.1. Horizontal fragmentation............................................................................ 62
7.6.2. Vertical fragmentation ................................................................................ 62
7.6.3. Mixed fragmentation................................................................................... 62
7.7. Data Replication..................................................................................................... 62
7.8. Deadlock handling ................................................................................................. 63
7.8.1. Deadlock prevention ................................................................................... 63
7.8.2. Deadlock detection and recovery................................................................ 64
8. SQL (Structured Query Language)........................................................................... 66
8.1. DDL Statements..................................................................................................... 66
8.1.1. Implicit commits......................................................................................... 67
8.1.2. Data dictionary............................................................................................ 67
8.2. DML....................................................................................................................... 68
8.3. Language Structure ................................................................................................ 68
8.4. Basic SQL Queries................................................................................................. 68
8.4.1. SQL data statements ................................................................................... 69
8.4.2. SQL-Transaction Statements ...................................................................... 72
8.4.3. SQL-Schema Statements ............................................................................ 72
8.5. Union, Intersect and Except................................................................................... 75
8.5.1. ALL............................................................................................................. 76
8.6. Cursors................................................................................................................... 79
8.6.1. Explicit Cursors .......................................................................................... 79
4. 8.6.2. Implicit Cursors .......................................................................................... 80
8.7. Triggers.................................................................................................................. 81
8.7.1. Creating Triggers ........................................................................................ 81
8.8. Dynamic SQL ........................................................................................................ 82
9. QBE........................................................................................................................... 83
10. Query Processing and Optimization ...................................................................... 83
10.1. Query Processing ................................................................................................. 83
10.2. Query Optimizing ................................................................................................ 85
10.3. Indexes ................................................................................................................. 85
10.4. Selectivities.......................................................................................................... 86
10.5. Uniformity............................................................................................................ 86
10.6. Disjunctive Clauses.............................................................................................. 87
10.7. Join Selectivities .................................................................................................. 88
10.8. Views ................................................................................................................... 89
11. OODBMS .............................................................................................................. 90
11.1. Characteristics of Object-Oriented Database....................................................... 90
11.2. Advantage of OODBMS...................................................................................... 91
11.3. Disadvantage of OODBMS ................................................................................. 92
12. ORACLE................................................................................................................ 92
12.1. Storage ................................................................................................................. 92
12.2. Database Schema ................................................................................................. 92
12.3. Memory architecture............................................................................................ 93
12.3.1. Library cache .............................................................................................. 93
12.3.2. Data dictionary cache.................................................................................. 94
12.3.3. Program Global Area .................................................................................. 94
12.4. Configuration....................................................................................................... 95
13. Objective Questions............................................................................................... 95
1. INTRODUCTION
A Database Management System (DBMS) is a set of computer programs that
controls the creation, maintenance, and the use of the database of an
organization and its end users. It allows organizations to place control of
organization-wide database development in the hands of database
administrators (DBAs) and other specialists. DBMSes may use any of a variety of
database models, such as the network model or relational model. In large
systems, a DBMS allows users and other software to store and retrieve data in a
structured way. It helps to specify the logical organization for a database and
access and use the information within a database. It provides facilities for
controlling data access, enforcing data integrity, managing concurrency
controlled, and restoring database.
The first DBMS appeared during the 1960's at a time in human history where
projects of momentous scale were being contemplated, planned and engineered.
5. Never before had such large datasets been assembled in this new technology.
Problems on the floor were identified and solutions were researched and
developed - often in real-time.
The DBMS became necessary because the data was far more volatile than had
earlier been planned, and because there were still major limiting factors in the
costs associated with data storage media. Data grew as a collection, and it also
needed to be managed at a detailed transaction by transaction level. In the
1980's all the major vendors of hardware systems large enough to support the
evolving needs of evolving computerized record keeping systems of larger
organizations, bundled some form of DBMS with their system solution.
The first DBMS species were thus very much vendor specific. IBM as usual led
the field, but there were a growing number of competitors and clones whose
database solutions offered varying entry points into the bandwagon of
computerized record keeping systems.
6. 1.1. DBMS Definitions
Some of the technical terms of DBMS are defined as below:
1.1.1. Database
A database is a logically coherent collection of data with some inherent meaning,
representing some aspect of real world and which is designed, built and
populated with data for a specific purpose. Ex: consider the name, telephone
number, and addresses.
You can record this data in an indexed address book. For maintain database we
generally use such software DBASE IV, Ms-Access or Excel
1.1.2. DBMS
It is a collection of programs that enables user to create and maintain a
database. In other words it is general-purpose software that provides the users
with the processes of defining, constructing and manipulating the database for
various applications.
1.1.3. Database system
The database and DBMS software together is called as Database system.
1.2. Components of database
1.2.1. Database administrator (DBA)
In many organizations where many persons use the same resources, there is a
need for a chief administrator to manage these resources.
In a database environment, the primary resource is the database itself and the
secondary resource is the DBMS and the related software. To manage these
resources, we need the database administrator.
DBA is responsible for authorizing access to the database and for acquiring S/W
and H/W resource as needed.
1.2.2. Database designer
They are responsible for identifying the data to be stored in the database and for
choosing appropriate structure to represent and store this data. The responsibility
of the database designer is to communicate with the database user and to
understand their requirement.
1.2.3. End users
7. These are the persons whose jobs requires to access to the database for
querying, updating and generating the reports. The databases generally exist for
their use.
There are several categories of end users:
A. Casual end users: who occasionally access the database but they need
different information each time.
B. Parametric end user: make up a sizable portion of the database end user
their main job function involves constantly querying and updating the
database. By using standard types of queries and updates called canned
transaction tat have been carefully programmed and tested. Such as bank
tellers' checks accounts balances, withdraws and deposits.
C. Sophisticated end users: includes engineers, scientist, and business
analyst who toughly familiarize with the facilities of the DBMS so as to
implement their application to meet the complex requirement.
D. Stand alone end users: maintains personal database by using
readymade software that provide easy to use menu or graphical based
interface. Ex: tax packages that store a variety of personal financial data
for tax purpose.
E. System analyst and application programmer: System analyst
determines the requirement of the end users, especially parametric end
users and develops specification for the canned transaction to meet their
requirement. Application programmer implements these specifications as
programs then they test, debug document and maintain these canned
transaction. These programmers are known as software engineer.
1.3. Advantages of DBMS
1. Controlling redundancy
2. Restricting unauthorized access
3. Providing persistent storage for program object and data structure
4. Database interfacing
5. Providing multiple user interface
6. Presenting complex relationship among data
7. Enforcing integrity constraints
8. Providing backup and recovery
1.4. Disadvantage in File Processing System
1. Data redundancy & inconsistency.
2. Difficult in accessing data.
8. 3. Data isolation.
4. Data integrity.
5. Concurrent access is not possible.
6. Security Problems.
2. DATA MODELS
Data model is a set of concepts that can be used to describe the structure of the
data base.
By the structure of the database as data type, relationship and constraints that
should hold for the data. Most of the data items also include a set of basic
operations for specifying the modification on the data.
2.1. Categories of data models
A. High level or conceptual data model: that describe how the user will
use the database. High-level data model uses concepts such as entities,
attributes and relationship.
B. Entity: represents real world objects such as employee or project that is
stored in the database.
C. Attribute: represents some property of interest that further describes the
entity such as employee name or salary.
D. Relationship: it represents the relationship between two or more entity.
Low level or physical data model: that describe how the data is stored in
the computer.
E. Representational or implementation data model: it hides some of the
details of data storage but can be implemented on a computer system in a
direct way.
2.2. Schemas and instances
The description of the database is called the database schema. The database
schema is specified during the database design. The displayed schema is called
a schema diagram and is not change frequently.
The actual data in the database may change frequently. In a data base changes
occur every time. We add a new student or entry a new grade for a student. The
9. data in the database at the particular moment of time is called the database state
or instance or snapshot.
2.3. DBMS architecture
Three important characteristics of the database
1. Insulation of program and data
2. Support of multiple user view
3. Use of catalogue to store the database schema
The architecture of the database system is called as three- schema architecture
1. Internal schema
2. Conceptual schema
3. External schema
1. Internal schema: it describes the physical storage structure of the
database. The internal schema uses a physical data model and describes
the complete details of data storage and access path for the database.
2. Conceptual schema: it describes the structure of a whole database for a
community of users. The conceptual schema hides the details of physical
storage structure. High-level data model or an implementation data model
can be used at this level.
3. External schema: it describes the part of the database that a particular
user group is interested in and hides the rest of the database from that
user group.
10. 2.4. Data independence
Three schema architecture can be used to explain the concepts of data
independence which can be defines the capacity to change the schema at one
level of the database system without change the schema at the next higher level.
There are two types of data independence:
2.4.1. Logical data independence
This is the capacity to change the conceptual schema without having to change
external schema or application programs. We can change the conceptual
schema to expand the database or to reduce the database.
2.4.2. Physical data independence
This is the capacity to change the internal schema without having to change the
conceptual or external schema. Changes to the external schema may be needed
because some physical files have to be reorganized. Ex: by creating additional
access structure to improve the performance of retrieval or updates.
2.5. Classification of database management system
We can categorize the DBMS as follows:
1. Relational data model
2. Network data model
3. Hierarchal data model
4. Object oriented data model
2.5.1. Relational data model
Relational data model represents a database as a collection of tables where
each table is stored as a separate file. Most relational database has high level
query language and support a limited form of users view.
2.5.2. Network data model
Represent data as a record type and also represent a limited type of 1:N
relationship, called a set of types.
11. 2.5.3. Hierarchal data model
It represents data as hierarchal tree structure. Each hierarchy represents a
number of related records. There is no standard language for hierarchal model.
2.5.4. Object oriented data model
It define a database in term of objects their properties and their operations.
Objects with the same structure and behavior belong to a class and classes are
organized into a hierarchy and cyclic graph.
2.6. Database languages and interfaces
2.6.1. DBMS languages
The first thing is to specify conceptual and internal schema for the database and
any mapping between two. In many DBMS where no strict separation of levels is
maintained one language called the data definition language (DDL) is used by
the DBA and data base designer to define both schemas.
In DBMS, there is a DDL compiler, whose function is to process DDL statements
in order to identify the description of the schema constructs and to store the
schema description in the DBMS catalog. Where the clear separation of
• Conceptual schema
• Internal schema
A. Then DDL is used to specify conceptual schema only.
B. SDL (storage definition language) is used to specify internal
schema only.
Mapping between two levels is specifying by the any of the two languages. In
some DBMS VDL (view definition language) is used to specify the users view
and their mapping to the conceptual schema. But in most DBMS, DDL is used to
specify both conceptual and external schema. Once the database schema is
12. created and database is filled with data. Users must have to manipulate the
database. Manipulations include:
• Retrieval
• Insertion
• Deletion
• Modification
For that purpose DBMS provides DML (database manipulation language).
2.6.1.1. DML database manipulation language
There are two main type of DML’s:
1. High-level or nonprocedural DML( SQL)
2. Low level or procedural DML
1. High-level or nonprocedural DML: can be used to specify complex
database operations. Many DBMS allows high-level DML statement either
to be entered interactively from a terminal or to be embedded in a general
purpose programming language. DML statement must be identified within
the program so that they can be extracted by a pre-compiler and
processed by the DBMS. High-level DML such as SQL can be specify and
retrieve many records in a single DML statement and hence are called
set-at-a-time or set-oriented DML’s.
2. Low level or procedural DML: must be embedded in a general purpose
programming language. This type of DML typically retrieves individual
records or objects from the database and processes each separately.
Hence it needs to use programming language, such as looping, to retrieve
and process each record from a set of records. Low-level DML are also
called record-at-a-time DML because of this property. Whenever DML
commands, high/low level are embedded in a general purpose
programming language that language is called the host language and the
DML is called the data sub language. On the other hand, high level DML
used in a stand-alone interactive manner is called a query language.
2.6.2. DBMS interfaces
User friendly interfaces provided by a DBMS may include the following:
Menu based interfaces: these interfaces present the user within list of options,
called menus, which lead the user through the formulation of a request. The
query is composed step-by-step by picking option from a menu that is displayed
by the system.
13. Forms based interface: a form-based interface display a form to each users.
Users can file out all of the form entries to insert new data or they file only certain
entries. Forms are actually designed and programmed for parametric end users.
Graphical user interface: GUI displays a schema to the user. User can then
specify a query by manipulating the diagram. Most GUI uses a pointing device as
mouse to pick up the certain part of the displayed schema.
Natural language interface: natural language interface refers to the world in its
schema as well as a set of standard word to interpret the request. If the
interpretation is successful, the interface generate a high level query
corresponding to the natural language request and submit it to the DBMS for
processing.
Interfaces for parametric users: parametric users, such as bank teller, often
have a small set of operations that they must perform repeatedly. System analyst
and programmer designed and implement a special interface for parametric user.
They generate keys by which that command automatically runs.
Interfaces for the DBA: the DBA staff uses these interfaces. These commands
are for creating accounts, setting system parameters, granting account
authorization, changing a schema and reorganizing the storage structure of a
database.
2.7. Database system environment
The database and the DBMS catalog are usually stored on the disk. Access to
the disk is controlled primarily by the operating system, which schedules disk
input/ outputs.
2.7.1. Data manager:
Modules of the DBMS controls:
A. Access to the DBMS information i.e. Stored on the disk.
B. It uses some basic OS services for carrying out low level data transfer
between the disk and computer main storage.
C. Handling buffers in the main memory.
2.7.2. DDL compiler
It processes schema definition specified in the DDL. The stored description of the
schemas in the DBMS catalog DBMS catalog: includes the following information
• Name of the files
• Data items
• Storage details of each file
• Mapping information
14. 2.7.3. Run-time database processor
It handles database accesses. It receives retrieval or updates operations and
carries them to the database. Access to the disk goes through the stored data
manager.
2.7.4. Query compiler
Handles high level queries that are entered interactively and then generates calls
to the run time processors for executing the codes.
2.7.5. Pre-compiler
Extracts DML commands from an application program written in a host language.
Then commands send to the DML compiler for compilation of object code.
2.8. Entity Relationship Model
For designing a successful database application there are two terms that play
major role in the designing of database application:
• Database application
• Application program
Database application: refers to a particular database (bank database) and
associate program implements the queries and updates.
Example: program that’s implements database updates corresponding to
customers. Making deposits and withdraws these program provides user friendly
graphical interfaces (GUI’s) utilizing forms and the testing of these application
program.
2.8.1. Entities and attributes
Entities: the basic object that the ER model represents is an entity. The entity
may be an object with a physical existence –a particular person, car, house or
employee or it may be an object with conceptual existence – a company, a job or
a universally course.
Attribute: a particular property that describes the entity.
Ex: entity –employee may be describe by the employee’s name, age, address,
salary and job.
Composite attribute: composite attribute can be divided into the subparts which
represents more basic attributes with independent meaning.
15. Simple or atomic attribute: Attributes that are not divisible are called simple or
atomic attribute.
Single valued attributes: most attribute have a single value for a particular
entity, such attribute are called single valued attribute. Ex: (age) single valued
attribute for person.
Multi valued attributes: the attributes, which may have more than one value.
Colors attributes of a car. Car with one color have a single value where cars may
have multiple values. Such attributes are called multi-valued attributes.
Stored attributes: in some cases two attributes values are related. Ex: age and
birth date of person. The value of an age can be determined by the current data
and the value of the person’s birth date. The age attribute is called the derived
attribute and the birth date is called the stored attribute.
Null values: in some cases a particular entity may not have appropriate value for
an attribute. Ex: apartment number
Complex attribute: we represent composite attribute between parenthesis () and
separating the components by commas. Multi valued attributes by { }. Such
attributes are called complex attribute.
{Address phone ({phone (area code, phone number)})}
2.8.2. Entity types, entity sets, keys and values sets
Entity types: an entity types defined a collection ( or sets ) of entities that have
the same attributes. Each entity type in the database is described by its name
and attributes.
Entity sets: the collection of all entities of a particular entity type in the database
at any point in time is called an entity sets.
16. Key attributes: an entity type usually have key attribute whose values are
distinct for each individual entity in the collection. Such an attribute is called the
key attribute.
Values sets (domain of attribute) each simple attribute of an entity type is
associated with a value set (or domain of value), which specify the set of values
that may be assigned to that attribute for each individual entity.
Ex: employee
Age specify in the range 16 to 70.
2.8.3. Relationship types, sets and instances
An association among entities is called a relationship. Relationship type R among
n entity types E1, E2……………..En defines a set of associations. In another
word, the attribute set R is a set of relationship instances.
17. Degree of relationship type: The degree of relationship type is the number of
participated entry types.
Ex: work for relationship is of degree two.
Degree two- binary relationship
Degree three - ternary relationship
Role name: each entity type that participates in a relationship type plays a
particular role in relationship. The role name specify the role that a particular
entity from the entity play in each relationship instances and helps to explain
what the relationship means.
Recursive relationship: Role name is not important where all the participating
entity type is distinct, since each entity type name can be used as the role name.
In some cases, some entity type participates in more than one in a relationship
type in different roles. In such cases role name becomes essential for
18. distinguishes the meaning of each participation. Such relationship types are
called recursive relationships.
Employee and supervisor entities are the member of the same employee entity
types.
Weak entity type: The entity types that do not have key attribute are called weak
entity type. Weak entity type some times called the child entity type.
Regular/ strong entity type: that have key attribute are called the regular or
strong entity type. Identifying entity type is also some time called the parent entity
type or dominant entity type.
2.8.4. Notations for ER diagram
Symbols Meaning
19. 2.8.5. Generalization
We think of a reverse process of abstraction in which we suppress the
differences among several entity type, identifying their common features and
generalize them into a single super class.
20.
21. 2.8.6. Aggregation
Aggregation is an abstraction concept for building composite object from their
component objects. There are calls where this concept can be used and related
to EER module.
• Where we aggregate attribute value of an object to form the whole object.
• When we represent an aggregate relationship as an ordinary relationship.
• Combining objects that are related by a particular relation instances.
3. RELETIONAL MODEL
The relational model represents the database as a collection of relations.
Relation is thought of as a table of values, each row in the table represents a
collection of related data values.
In relational model, each row in the table corresponds to entity or relationship. In
a relational model concept, a row is called a tuples, columns are called as
attributes, and the table is called a relation.
The data type describing the type of values that can appear in each column is
called a domain.
Domain:
The domain D is a set of atomic values. Atomic means that each value in the
domain is indivisible.
USA_phone_number- 10 .digits
22. Relation schemas:
R is denoted as R(A1,A2,A3.......An)
R is the relation name
Ai attributes for I=1,2,3,.....n
Student (name, SSN, home phone, address, office phone, age)
3.1. Characteristics of relation
1. Ordering of tuple in relation: a relation is defined as a set of tuple. Tuples
in a relation do not have any specific meaning.
2. Ordering of values within a type: n-type is an ordered list of n- values, so
ordering of value in a type. Attributes values are with in types of order.
3. Values in the tuples: each value in a tuple is a atomic value. I.e. it is not
divisible into its components. In a relational model concepts composite
and multi valued attributes are not allowed.
4. Interpretation of relation: the relation schema can be interpreted as a
declared or type of assertion.
Relational constraints: in this relational constraints we will study about the
restrictions apply on the database schema. These includes
Domain constraints: it specifies that the value of each attribute must be atomic
value.
Key constraints: a relation is defined as a set of tuples. All elements of sets are
distinct. Hence all tuples in the relation must be distinct. No two tuple can have
the same combination of all their attribute values.
Entity integrity constraints: no primary key value can be null, because it is
used to identify the individual tuples n a relation.
Referential integrity constraints: it is specified between two relations and is
used to maintain the consistency among tuples of the two relations. It is based on
the foreign key concepts.
3.2. Operations of the relation model
Operations on the relational model can be categorized into retrieval and updates.
There are three basic updates operations on relations.
Insert operation: it provides a list of attributes for a new tuple t that can be
inserted into a relation R.
Delete operation: it is used to delete a tuple from a relation if the tuple is being
deleted as referenced by the foreign key from other tuple in the database. We
use condition to delete the tuple.
Ex: delete from employee
Where SSN=. 985676;
23. Update operation: is used to change the value of one or more attribute in a tuple
of relation R.
Ex: update employee
Set age=.25.
Where SSN=.576787;
3.3. Relational algebra operation
1. Select operation: is used to select the subset of the tuples from a relation that
specify a selection condition or it is used to select some of the row from a
relation.
2. Project operation: it is used to select some of the column (set of attribute)
from a relation.
3. Rename operation: which is used to rename either relation name or attribute
names or both.
Rename (old table name) to (new table name)
3.4. Set theoretic operation
Several set theoretic operations are used to merge the elements of two sets in
various ways. These operations are as follows.
3.4.1. Union
The result of this operation is denoted by the R U S, is a relation that includes all
tuples that are either in R or in S or in both R and S. Duplicate tuples are
eliminated.
R U S = S U R {commutative operation}
Select salesman 'ID", name
From sales_master
Where city =.mumbai.
union
Select client "ID" , name
24. From client_master
Where city =.mumbai.;
3.4.1.1. Restrictions on using an union operation is as follows
1. The number of column in all the queries should be same.
2. The data type of the column in each query must be the same.
3. Union cannot be used in the sub query.
4. Aggregate function cannot be used with union clause.
3.4.2. Intersection
The result of this operation is denoted by RП S is a relation that include all tuples
that are in both R and S.
Select salesman "ID", name
From sales_master
Where city =.mumbai.
intersect
Select client "ID", name
From client_master
Where city =.mumbai.;
3.4.3. Set difference
The result of this operation is denoted by R.S, is a relation that includes all tuples
that are in in R but not in S.
Selecr product_no from product_master
Minus
Select product_no from sales_order;
3.4.4. Join operation
Denoted by X, is used to combine related tuples from two relations into a single
tuple. This operation is very important because it allow us to process relationship
among relations.
R X (join condition) S
There are some categories of join operations.
1. Cartesian product (cross product) or (cross Join); the main difference
between the Cartesian product and join, in join, only combination of tuples
satisfy the join condition appear in the result.
25. 2. equi join: where only comparison operator is used =, is called the equi
join. Each pair of attributes with identical value is spurious. Removal of
spurious tuples is followed by natural join R * S.
3.4.5. Division operation
Division operation if used for special kind of query that sometimes occurs in
database application.
3.4.6. Aggregate function
On collection of values from the database, these functions are as follows:
SUM, AVERAGE, MAX, MIN
3.4.7. COUNT
This function is used to count tuples and attributes.
3.4.8. Grouping
This is used to group the attribute of any relation.
Select company, sum (amount) from sales
Group by company
Having sum (amount) > 10,000;
3.4.9. Recursive closure operation
This operation is used a recursive relationship.
26. 3.4.10. Outer join
Natural join is denoted by R * S
Where R, S are relations
Only tuples from R that have matching tuple in S will be selected in the result and
without matched tuples are eliminated. Null tuples also eliminated.
A set of operations, called outer join can be used when we want to keep all the
tuples in R and S or in both. The relations whether they match or not.
Outer join is used to take the union of tuples of twp relations, if the relation is not
union compatible. Then they are called partially compatible. Only some of their
attributes are union compatible. This type of attribute has a key for both the
relation.
Left outer join:
R =>< S
Keeps every tuple or R, if no matching found in S, then S have null values.
Right outer join:
R ><= S
Full outer join R=><=S
{If no match found set null value in the tuple}
Outer union:
Student (name, SSN, department, advisor)
Faculty (name, SSN, department, Rank)
Result (name, SSN, department, advisor, Rank)
All the tuples of both the relation will appear in the result.
3.5. Tuple relational calculus
Relational calculus is formal query language. When we write one declarative
expression to specify a relation request and hence there is no description how to
evaluate the query.
Tuple relational calculus is based on specifying a number of tuple variables.
Variables may take as its value any individual tuple from that relation. A simple
tuple relational calculus queries is of the form
{t | cond(t)}
27. Result will display the set of all tuple t that satisfy cond(t).
Ex: find all employees whose salary > 50,000.
{t|employee(t) and t.salary>50,000}
This notation resembles how attributes name are qualified with relation names.
{t.fname,t.lname|employee(t) and t.salary>50,000}
select t.fname, t.lname
from employee as t
where t.salary >50,000;
3.5.1. Expression and formulas in tuples calculus
A general expression of the tuple relational calculus of the form
{t1.a1, t2.a2.....tn.an | cond(t1.t2.t3.t4........tn)}
Where
t1.a1, t2.a2.....tn.an → tuple variable
Ai is an attribute of the relation on which ti ranges.
Cond → is a condition or formula
Formula:
Formula is made up of predicate calculus atoms which can be one of the followings.
1. An atom of the form R(ti)
where R → relation name
ti → tuple variable
R(ti) → identifies the range of the tuple variable ti as the relation whose name
is R
2. An atom of the form ti. A op tj.B
where op → comparison operator
set = { > < >= <= #}
ti and tj are tuple variable
A → attribute of the relation on which ti ranges
B → attribute of the relation on which tj ranges
3. An atom of the form ti.A op c or c op tj.B
where op → comparison operator
ti and tj are tuple variable
A → attribute of the relation on which ti ranges
B → attribute of the relation on which tj ranges
C → constant value
A formula is made up of one or more atoms connected via the logical operator
28. AND, OR, NOT, and is defined as follows
1. Every atom is a formula.
2. If f1 & f2 are formulas, then so are ( f1 AND f2), ( f1 OR f2), ( f1 NOT f2) and NOT
(f2)
3. The truth values of these formulas are derived from their component formulas f1
and f2 as follows.
a. (f1 AND f2) is true if both f1 and f2 are true.
b. ( f1 OR f2)is false , if both f1 and f2 are false otherwise true
c. NOT (f1) is true if f1 is false, it is false if f1 is true
d. NOT (f2) is true if f2 is false, it is false if f2 is true
3.5.2. Existence and universal quantifier
Two special symbols called quantifier can appear in formulas, there are
1. Universal quantifier
2. Existential quantifier
Firstly we need to define the concept of free and bound tuples in formulas.
Bound: a tuple variable t is bound if it is quantified meaning that it appear in an
and
Free: otherwise it is free.
We can define the tuple variable in a formula as free and bound according to the
following rule.
1. An occurrence of a tuple variable in a formula F that is an atom is free in
F.
2. An occurrence of a tuple variable t is free or bound in formula made up of logical
connectives. (f1 AND f2), ( f1 OR f2), ( f1 NOT f2) and NOT (f2) depending on whether it
is free or bound in f1 and f2. a tuple variable may be free or bound either in f1 or in f2.
3. All free occurrence of a tuple variable t in f are bound in a formula f of the form.
F.= ( f) or
F.= (F)
The tuple variables are quantifier specified in f.
F1= d.dname=.research.
F2= ( d.dname=t.DNO)
F3= ( d.mgrssn=.12345677)
Tuple variable d is free in both f1 & f2. where it is found to the universal quantifier
in f3.
t→ is bound to the quantifier in f2.
3.5.3. Rules for the definition of a formula
29. 1. if f is formula then so
is ( f)
where t→ tuple variable
the formula ( f) is true
if the formula f evaluates to true some ( at least one) tuple assigned to free
occurrence of t in f. otherwise ( f) is false.
2. if f is a formula , then so is (F)
where t→ tuple variable
The formula (F) is true, if formula f evaluates to true for every tuple (in the
universe) assigned to free occurrence to t in f.
otherwise (F) is false.
Note: quantifier called the existential quantifier because a formula
(f) is true , if there exist some tuples that makes f true.
quantifier called the universal quantifier (F) is true for every possible
tuple.
3.6. Transforming the universal and existential quantifier
We now use some of the transformation from mathematical logic that states the
universal and existential quantifier. It is possible to transform a universal
quantifier into an existential quantifier and vise-versa.
3.6.1. Domain relational calculus
There is another type of relational calculus called the domain relational calculus
or simply domain calculus. The QBE language related to domain calculus. The
specification of domain calculus was proposed after the development of QBE
language.
Domain calculus is differing from the tuple calculus in the type of variable used in
the formula.
An expression of the domain calculus is of the form
{X1, x2.....xn+1.....xn+m) | cond(x1,x2.....xn+1.....xn+m}
where
30. X1, x2.....xn+1.....xn+m are domain variable that ranges domain of attributes.
Cond= is the condition or formula of the domain relational calculus.
A formula is made up of atoms. A formula can be one of the followings.
1. An atom of the form R(x1, x2....xj)
R → name of relation of degree
And each
Xi 1<= I<=j is a domain variable
2. An atom of the form xi op xj
where op → comparison operator in the set
3. An atom is of the form xi op c or c op xj
where op → comparison operator in the set
Xi and xj are domain variable
c → constant value
4. DATABASE DESIGN
Conceptual database design gives us a set of relational schemas and integrity
constraints (ICs) that can be regarded as a good starting point for the final
database design. This initial design must be refined by taking the ICs in to
account more fully than is possible with just the ER model constructs and also by
considering performance criteria and typical workloads.
We concentrate on an important class of constraints called functional
dependencies. Other kind of ICs, for example, multi-valued dependencies and
join dependencies, also provide useful information. They can sometimes reveal
redundancies that cannot be detected using functional dependencies alone.
4.1. Schema Refinement
Redundant storage of information is the root cause of problems. Although
decomposition can eliminate redundancy; it can lead to problems of its own and
should be used with caution.
4.1.1. Guidelines for relation schema
1. Semantics of the attributes: every attributes in the relation must belong to
the relation as we know; relation is a collection of attributes and having a
meaning. Semantics means, how the attribute values in a tuple relate to one
another.
Example: (ename, ssn, bdate, address, dnumber)
Each attribute give the information about employees.
2. Redundant information in the tuples:
For the best use of free space, we disallow the redundant tuples from a relation.
For this we use some anomalies.
31. • Insert Anomalies
• Deletion Anomalies
• Modification Anomalies
3. Reducing null values in a tuple:
Because this can waste space at the storage level and may create a problem
with under standing the meaning of the attribute. Null values can have multiple
interpretations.
• Attributes values does not apply.
• Attribute values are not known for a tuple.
• Value is known but has not been recorded yet.
4. Spurious tuples:
Spurious tuples are those tuples which give the wrong information. The spurious
tuples are marked by asterisks (*).
Example:
Emp_loc (ename, plocation)
Emp_proj (ssn, pno, hours, pname, plocation)
4.2. Functional Dependencies
A functional dependency is denoted by X → Y between two sets of attributes
X and Y.
For any two tuples t1 and t2 in r
T1[X]=t2[X]
We must also have T1[Y]=t2[Y]
This means that the value of Y component of a tuple is depend on, or determine
by the value of X components or vise-versa.
X → called the left hand side of the FD
Y → called the right hand of the FD
X functionally determines the Y in a relation R if and only if whenever two
tuples of r(R) agree on their x value and agree on y values.
1. X is a candidate key. Because the key constraints imply that not two
tuples will have the same value of X.
2. if X Y in R, this does not say whether or not Y → X in R.
32. 4.2.1. Interference rules for Functional Dependencies
Set of functional dependency is denoted by F that is specified on relational
schema R. it is impossible to specify all possible functional dependencies that
may hold. The set of all such dependency is called the closure of F and is
denoted by F+.
F={ssn → {ename,bdate,address,dnumber},
Dnumber → {dname,dmgrssn}}
Ssn → {dname,dmgrssn},
Ssn → ssn,
Dnumber → dname
F+ is also known as infer dependency. To determine a systematic way to infer,
we use inference rules. F=x→y is used to denote that the functional dependency.
X→Y is inferred from the set of FD of F.
4.2.2. Axioms to check if FD holds
33. 4.2.3. An Algorithm to Compute Attribute Closure X+ with respect to F
Let X be a subset of the attributes of a relation R and F be the set of functional
dependencies that hold for R.
1. Create a hyper graph in which the nodes are the attributes of the relation
in question.
2. Create hyperlinks for all functional dependencies in F.
3. Mark all attributes belonging to X
4. Recursively continue marking unmarked attributes of the hyper graph that
can be reached by a hyperlink with all ingoing edges being marked.
Result: X+ is the set of attributes that have been marked by this process.
4.2.3.1. Hyper graph for F
4.3. NORMALIZATION
4.3.1. Basics of normal forms
A set of functional dependencies is specified for each relation, the process which
is top-down fashion and decomposing relation as necessary. Initially codd(1972)
proposed 1NF,2NF,3NF. The stronger definition of 3NF is boyce-codd normal
form proposed be Boyce and codd.
All these normal forms are based on the FD of a relation. After some time, 4NF &
5NF were proposed based on the concept of multi-valued dependencies and join
dependency.
4.3.1.1. 1NF (first normal form)
It was defined to disallow multi-value and composite attribute and their
combination. It states that the domain of an attribute must include only atomic
34. value. Values of any attributes in a tuple must be a single value from the domain
of that attribute.
4.3.1.2. 2NF (second normal form)
Second normal form is based on the concept of full functional dependency. A FD
(X→Y) if full functional dependency. If removal of any attribute A from x means
that the dependency does not hold any more.
i.e A x
(X→{A}) does not determine Y. X→Y is partial dependency if some attribute
removes from x.
35. 4.3.1.3. 3NF (third normal form)
It is based on the concept of transitive dependency.
FD x→Y in a relation schema R is transitive dependency.
There is a attribute z→ neither candidate key not a subset of any key of R.
X→Z
Z→Y
Dependency hold.
4.3.1.4. Boyce-Codd normal form (BCNF)
Boyce-Codd normal form (or BCNF) is a normal form used in database
normalization. It is a slightly stronger version of the third normal form (3NF). A
table is in Boyce-Codd normal form if and only if, for every one of its non-trivial
functional dependencies X → Y, X is a superkey - that is, X is either a candidate
key or a superset thereof.
36. Only in rare cases does a 3NF table not meet the requirements of BCNF. A 3NF
table which does not have multiple overlapping candidate keys is guaranteed to
be in BCNF. Depending on what its functional dependencies are, a 3NF table
with two or more overlapping candidate keys may or may not be in BCNF.
An example of a 3NF table that does not meet BCNF is:
Today's Court Bookings
Court Start Time End Time Rate Type
1 09:30 10:30 SAVER
1 11:00 12:00 SAVER
1 14:00 15:30 STANDARD
2 10:00 11:30 PREMIUM-B
2 11:30 13:30 PREMIUM-B
2 15:00 16:30 PREMIUM-A
37. • Each row in the table represents a court booking at a tennis club that has
one hard court (Court 1) and one grass court (Court 2)
• A booking is defined by its Court and the period for which the Court is
reserved
• Additionally, each booking has a Rate Type associated with it. There are
four distinct rate types:
• SAVER, for Court 1 bookings made by members
• STANDARD, for Court 1 bookings made by non-members
• PREMIUM-A, for Court 2 bookings made by members
• PREMIUM-B, for Court 2 bookings made by non-members
4.3.1.5. Algorithm for relational database
For a database, a universal relation schema R=(A1,A2,……….An) that include all
the attribute of the database. In this universal relation assumption, this states that
every attribute name is unique. A set of functional dependency that should hold
on the attribute or R specified by the database designers. Using functional
dependency, the algorithms decompose the universal relation schema R into a
set of relation schema D=(R1,R2,………….Rm)
D= relational database schema (D is called a decomposition of R)
We must sure that each attribute in R will appear in at least one relation schema
Ri in the decomposition, so that no attribute are lost.
I=1UmRi=R
R = {R1UR2UR3………………….Rm}
This is called attribute preservation condition of decomposition.
4.3.1.6. Decomposition and dependency preservation
If each functional dependency X→Y specified in F appears directly in one of the
relation schemas Ri in the decomposition D or could be inferred from the
dependencies that appears in some Ri. This is the dependency preservation
condition.
We want to preserve the dependency because each dependency in F represents
constraints on the database. That is needed to join two or more relations.
Suppose that a relation R is given and a set of functional dependency F.
F+ is the closure of F.
Decomposition D = {R1, R2………………Rm} of R is dependency preservation
with respect to F.
38. 4.3.1.7. Decomposition and lossless (non-additive) joins
Another property a decomposition D should process in the loss-less join or non-
additive join property. Which ensure that no spurious tuples are generated, when
a normal join operation is applied to the relation in the decomposition. The
condition of no spurious tuples should hold on every legal relation state. Every
relation satisfies the functional dependency in F.
A decomposition D={R1,R2………………Rm} of R has the loss-less (non-
additive) join property with respect to the set of dependency F of R. if every
relation state r of R that satisfy F. where * is the natural join of all the relation in
D.
Word loss in loss-less refers to the loss of information, not loss of tuples. If
decomposition does not have loss-less join property. We may get additional
spurious tuples.
4.3.1.8. Multi-valued dependencies and fourth normal forms
In this section we will study about multi-valued dependency. That is a
consequence of first normal form (1NF), which allowed an attribute in a tuple to
have a set of values. For multi-valued attribute, we repeat every value of one of
the attribute with every value of the other attribute to keep the relation state
consistent. This constraint is specified by a multi-valued dependency.
An employee may work on several projects and several dependent. But project
and dependent are independent to each other. To make the relation consistent,
we must have a separate tuple to represent every combination of an employee’s
dependent and employee project. This constraint is specified as multi-valued
dependency.
39. 4.3.1.8.1. Inference rules for functional and multi-valued dependency
To develop inference rule that includes both FD’s and MVD’s, so that both types
of constraints can be considered together.
Inference rules IR1 through IR8 form a complete set for FD’s and MVD’s from a
given set of dependency.
R={A1,A2……………….Am} and X,Y,Z,W are subset of R.
4.3.1.8.2. Fourth normal forms
A relation schema R is in 4NF respect to a set of dependency F (that includes FD
and MVD) if, for every MVD’s X→→Y in F+, X is a super key for R.
40. 4.3.1.9. Loss-less join decomposition
4.3.1.10. Join decomposition and fifth normal form
Join dependency (JD), denoted by JD (R1, R2……………..Rn) specified on
relation schema R, specifies constraints on state r of R.
The constraints state that every legal state r of R should have a loss-less join
decomposition into R1, R2……………..Rn.
A join decomposition JD (R1, R2……………..Rn) specified on relation schema R
is a trival JD if one of the relation schema Ri in JD (R1, R2……………..Rn) is
equal to R. such a dependency is called trival dependency because it has the
loss-less join property. For any relation state r of R and hence does not specify
any constraints on R.
4.3.1.10.1. Fifth normal forms (Project join normal form)
A relation schema R is in 5NF or project join normal form (PJNF) with respect to
a set F of functional , multi-valued dependency JD (R1, R2……………..Rn) in F+
(i.e. implies by F), every Ri is a super key of R.
41. Example:
4.4. Inclusive dependency
Inclusion dependency was defined in order to formalize certain interrogational
constraints.
Example:
Foreign key constraints cannot specify as FD’s or MVD’s. Because it relates
attributes across relations. It can be specified as inclusive dependencies.
Inclusive dependency is also used to represent the constraints between two
relations.
An inclusive dependency
R.X<S.Y between two relation (set of attributes)
X of relation R
And Y of relation S
x of R and y of S must have the same number of attributes.
Example:
If X = {A1, A2……………….An}
And
Y = {B1, B2…………………Bn}
Where
1<=I<=n
Ai corresponds to Bi.
Inference rules for inclusive dependency.
1. IDIR1 reflexive rule R.X<R.X
2. IDIR2 Attribute correspondence
42. If
R.X<S.Y
where
X={A1,A2……………….An}
And
Y={B1,B2…………………Bn}
Ai corresponds to Bi.
R.Ai<S.Bi for 1<=i<=n
3. IDIR3 transitive rule
If R.X<S.Y
And
S.Y<T.Z
Then
R.X<T.Z
All the inclusion dependency represents referential integrity constraints.
5. TRANSACTION MANAGEMENT
5.1. Transaction Concept
A transaction is a unit of program that access and possibly updates various data
items. A transaction usually results from the execution of a user program written
in high level language or data manipulation language or any other programming
language.
Example: SQL, COBOL, C, PASCAL
And is determine by statements or system calls of the form begin transaction and
end transaction.
The transaction consist of all the operation between begin and end. To ensuring
the integrity of data, we require that the database must maintain the following
properties.
1. Atomicity: either all operation of the transaction is reflected property in
database or none.
2. Consistency: execution of a transaction in isolation (i.e. no other
transaction execution concurrently) preserve the consistency of the
database.
3. Isolation: even through multiple transactions can execute concurrently.
Ti & Tj set of transactions
Ti→ execution finished
Tj→ start execution
4. 4. Durability: after a transaction complete successfully, the changes it
has made to the database persist, even if there is a system failure.
43. These properties are called as ACID properties.
Access to the database accomplished by the following two operations.
1. Read(X): which transfer the data item X from the database to local buffer
belonging to the transaction that execute the read operation.
2. Write(X): that execute the write back to the database.
Example:
Ti that transfer $50 from account A to account B.
Ti:
READ(A)
A:=A-50;
WRITE(A)
READ(B)
B:=B+50;
WRITE (B)
Initial value of A and b are 1000$ and 2000$. Suppose the system failure occurs
after the write (A) and before .Then the account information
A=950$
B=2000$
5.2. Transaction state
Compensating transaction: to undo the effect of committed transaction is to
execute a compensating transaction.
We establish a simple abstract transaction model transaction must be in the
following states.
Active: the initial state, the transaction stays in this state while executing.
Partially committed: after the final statement has been executed.
Failed: after the discovery that normal executing can no longer proceed.
Aborted: after the transaction has been rolled back and the database has
been restored the prior state.
Committed: after successful completion.
A transaction enters the failed state after the system determines that the
transaction can no longer proceed with its normal execution.
Example:
Hardware or logical errors, such as a transaction must be rolled back, then
entered the aborted state, at this point system has two options.
1. Restart the transaction: hardware or software error
44. 2. Kill the transaction: internal logical error that can be correct only by
rewriting or because due to the bad input.
5.3. Implementation of atomicity and durability
Recovery management component of a database system implements the
support of atomicity and durability.
Shadow-database scheme: transaction that wants to update on the database,
first create the complete copy of the database. All updates are done into the new
copy of the database, leaving the original copy, called the shadow copy.
If at any time, transaction has to be aborted, the new copy deleted. The old copy
of the database is unaffected. If transaction completes, operating system asks
write all the new copy on to the disk.
In UNIX operating system FLUSH command is used. After the FLUSH has
completed db_pointer, now points to the current copy of the database.
5.4. Concurrent Execution
A database system must control the interaction among the concurrent transaction
to ensure consistency of the database. In this section, we focus on the concept of
concurrent execution.
Example:
Consider the set of transaction that access and updates the bank account.
Let T1 and T2 be two transactions.
T1:
READ(A)
A:=A-50;
WRITE(A)
READ(B)
B:=B+50;
WRITE(B)
T2:
READ(A)
TEMP:=A*0.1;
A:=A-TEMP;
WRITE(A)
READ(B)
B:=B+TEMP;
WRITE(B)
Initial value of A and b are 1000$ and 2000$.
CASE1.
45. If T1 followed by T2
A=855$
B=2145$
CASE2
If T2 followed by T1
A=850$
B=2150$
5.5. Schedule
Execution sequences are called as schedules that show the order of transaction
execution. These schedules are called serial schedule. Each serial schedule
consists of a sequence of instruction from the various transactions, where the
instruction belonging to the single transaction appears together in the execution.
If two transactions are running concurrently, the CPU switches between the two
transactions or shared among all the transaction.
Final value of A and B are A=855$, B=2145$
Some of the schedules leave the database in inconsistence state.
Consider the example:
46. Final value of A and B are A=900$, B=2150$. Here we gained 50$.
5.6. Serializability
The database system must control execution of concurrent transaction to ensure
that the database system remains consistent. Then we first understand which will
ensure consistency and which schedule will not.
Generally transaction performs two operations.
I. Read operation
II. Write operation
A transaction performs this sequence of operations on the copy of Q that is
residing in the local buffer of the transaction. Here we will discuss different forms
of schedules.
I. Conflict Serializability
II. View Serializability
5.6.1. Conflict Serializability
Consider a schedule S that consist two consecutive transactions Ti and Tj.
Where Ii and Ij are instructions respectively (I≠j)
1. If Ii and Ij refer to the different data items then we can swap Ii and Ij
without affecting the result of any instruction in the schedule.
47. 2. If Ii and Ij refer to the same data items then the order of two steps may
matters. Here we are dealing with two operation read operation and write
operation.
a. Ii=READ(Q), Ij=READ(Q) order does not matter. Because the same
value of Q is read by both ( Ti and Tj)
b. Ii=READ(Q), Ij=WRITE(Q) order will matter.
c. Ii=WRITE(Q), Ij=READ(Q) order will matter.
d. Ii=WRITE(Q), Ij=WRITE(Q)
Since both instructions are write operation. The order of this instruction does not
affect Ti and Tj. But the value obtained by the next read (Q) instruction of S is
affected.
We sat that Ii and Ij conflict if they are operation by different transaction on the
same data item, and at least one of these instructions is a write operation.
Serial schedule is defined as the all the instruction of any transaction executes
together.
48. If a schedule S can be transformed into a schedule S’ by a series of swaps of no-
conflicting instruction, we say that S and s’ are conflict equivalent.
The concepts of conflict equivalent leads to the concepts of conflict Serializability,
we say that a schedule S is conflict Serializable if it is conflict equivalent to a
serial schedule.
Such analysis is hard to implement and computationally expensive. We will
consider one such definition.
5.6.2. View Serializability
It is similar to conflict Serializability and based on the only read and write
operations of transactions. Consider the two schedule S and S’, where the same
set of transaction participates in both schedule. The schedule and S’ are said to
view Serializability, is they satisfy the following three conditions:
1. 1.for each data item Q, if transaction Ti reads the initial value of Q in
schedule S, then the transaction Ti must be in schedule S’, also read the
initial value of Q.
2. for each data item Q, if transaction Ti executes the read (Q) in schedule S
and that the value was produced by transaction Tj, then transaction Ti
must be in schedule S’ also read the value of Q that was produced by the
Tj.
3. for each data items Q, the transaction that performs the final write (Q)
operation in schedule S must performs the final write(Q) operation in
schedule S’.
5.7. Recoverability
Still we are discussing about which schedule will ensure the consistency of the
database and which will not. With assuming that there is no transaction failure
now, we address the effect of transaction failure during concurrent execution.
Transaction Ti→ that fails, for what ever reason and we need to undo the effect
of Ti to ensure atomicity property. In a system that allows concurrent execution.
Tj that is dependent upon on Ti.( Tj reads the data item written by the Ti) also
aborted.
That’s why we need to place some restrictions on that schedules.
5.7.1. Recoverable schedule
49. Most database system requires that all schedules be recoverable. A recoverable
schedule is one where, for each pairs of transaction Ti and Tj such that Tj reads
a data item previously written by Ti. The commit operation of Ti appears before
the commit operation of Tj.
5.7.2. Cascade less schedule
Consider the example
T10 writes a value that is read by T11. Suppose T10 fails, T10 must be rolled
back. Since T11 is dependent on T10, T11 and T12. Then all the remaining
transaction must be rolled back.
The phenomenon in which a single transaction failure leads to a series of
transaction rollbacks is called Cascading roll back. It is desirable that cascading
roll backing should not be occurs in a schedule. Such schedules are called as
cascade less schedule. For every pairs of transactions, such as Ti and Tj, where
Tj reads the data item written by the Ti, the execution of Ti must finish before Tj.
Then it is easy to identify that recoverable schedule is cascade less schedule.
5.8. Testing for Serializability
Every schedule must be Serializable, we first understand to determine a given
particulars schedule S is Serializable or not.
Let S be a schedule. We construct a directed graph (precedence graph) from S.
G= (V, E)
Where V→ set of vertices
E→ set of edges
Vertices: consists all the transactions that are participating in a schedule.
Edges: Ti→ Tj for which one of the following condition hold.
1. Ti executes write(Q) before Tj executes read(Q)
2. Ti executes read(Q) before Tj executes write(Q)
3. Ti executes write(Q) before Tj executes write(Q)
For any particular schedule S
T1----------→ T2
All the instructions of T1 executes before the first instruction of T2.
50. 5.9. Precedence graph
By using precedence graph scheme it is not conflict serializable. But it is view
serializable. There is an edge T4→ T3 are called useless writes.
To test view serializability, we develop a scheme for deciding whether an edge is
need to be inserted in a precedence graph.
Schedule S
Tj reads a value written by Ti { Ti→ Tj}
If schedule S is view serializable then any schedule S’ i.e equivalent to schedule
S.
Tk executes write(Q)
Then in schedule S’
Tk→ Ti
Either
Tj→ Tk
It can not appear between Ti and Tj.
51. To test view serializability, we need to extend the precedence graph to include
labeled edges. This types of graph termed as label precedence graph.
Rules for inserting labled edges in precedence graph:
Let us consider a schedule S having transaction s (T1, T2………….Tn)
Let Tb and Tf two transactions
Tb issues write(Q) for each Q accessed in S
Tf issues read(Q) for each Q accessed in S
Now, we construct a new schedule S’ from S by inserting
Tb at the beginning of S
Tf at the end of S.
We construct the labeled precedence graph for schedule S’ as follows.
1. Add an edge Ti→ Tj. If Tj reads the value of a data item Q written by Ti.
2. Remove all edges incident on useless transactions. A transaction Ti is
useless if there exsist no path in the precedence graph, from Ti→ Tf.
3. for each data item Q such that Tj reads a value of Q written by Ti and Tk
executes write(Q) and Tk≠Tb, do the followings:
a. Ti=Tb and Tj ≠ Tf then insert an edge Tj→ Tk.
b. If Ti≠ Tb and Tj=Tf then insert an edge Tk→ Ti.
c. If Ti≠ Tb and Tj≠Tf then insert an edge Tk→ Ti.
And Tj→ Tk in the labled precedence graph. Where P= unique
number.
6. CONCURRENCY CONTROL
When several transactions executes concurrently in the database, the isolation
property may no longer preserved. It is necessary for the system to control the
interaction among concurrent transaction. These types of controls are termed as
concurrency control schemes.
6.1. Lock based protocols
One way to ensure the serializability is to require that access to data item be
done in a mutually exclusive manner. I.e while one transaction is accessing a
data item, no other transaction can modify that data item.
One way to implement this requirement is to allow a transaction to access a data
item if it is currently holding a lock on that data item.
6.1.1. Locks
There are various modes in which a data item may be locked.
Share mode: if a transaction Ti has share mode lock (denoted by S) on the data
item Q, then Ti can read but can not write Q.
Exclusive mode: (denoted by X) then Ti can perform both read and write on Q.
Example:
54. This situation is called deadlock. When deadlock occurs, the system must roll
back on of the two transactions. The data item that was locked by that
transaction is unlocked. These data items were available to other transactions.
6.1.2. Granting of locks
When a transaction requests a lock on the data item in particular mode, and no
other transaction has a lock on the same data item in a conflicting mode. The
lock can be granted.
Suppose Transaction T2 has a lock and T1 request (has to wait) for T2 release
the exclusive mode lock.
T1 will wait
T2→ lock-S(Q)( has)
T1→ lock-X(Q) (wait)
T3→ lock-S(Q)( request)
T4→ lock-S(Q)( request)
T1 is still waiting. This situation is called as starvation where a particular
transaction continuously waiting for a particular lock on the same data item.
6.1.3. Avoiding starvation of transaction by granting locks
When a transaction Ti request a lock on data item Q in particular mode M, the
lock is granted provided that
1. There is no other transaction holding a lock on Q in a mode that conflict with
M.
2. There is no other transaction that is waiting for a lock on Q and that made its
lock request before Ti.
6.2. Two phase locking protocol
One protocol that ensures serializability is the two phase locking protocol. This
protocol requires that each transaction issue locks and unlock request in two
phases.
1. Growing phase: a transaction may obtain locks but not release any locks.
2. Shrinking phase: a transaction may release locks but may not obtain any
new locks.
When a transaction has obtained its final locks is called the lock point of
the transaction.
Cascading rollbacks can be avoided by a modification of two phase locking
called the strict two-phase locking protocol. This protocol requires that all
55. exclusive locks taken by the transaction must be held until that transaction
commits.
This requirement ensures that any data item written by an uncommitted
transaction are locked in exclusive mode until the transaction commits,
preventing any other transaction from reading the data. Another type of protocols
is the rigorous two-phase locking protocol. Which requires all lock to be held until
the transaction commits. It can be easily verified that transaction can be
serialized.
6.3. Graph based protocol
If we wish to develop protocol that is not two-phase, we need additional
information on how each transaction will access the database. In this model we
have prior knowledge about the order in which the database item will be
accessed.
To acquire such prior knowledge a particular order on the set D = (d1,
d2……………..dn) of all data items. If di→ dj. Then any transaction di and dj
must access di before accessing dj. This ordering can be shown as a directed
acyclic graph, called database graph. We restricted to employee only exclusive
locks.
In the tree protocol, the only lock allowed is lock-X. Each transaction action Ti
can lock data item at most once and must follows the rules.
a. The first lock by Tj may be on any data item.
b. Subsequently, a data item Q can be locked by Ti only if the parent
of Q is currently locked by Ti.
c. Data item may be unlocked at any time.
d. A data item that has been locked and unlocked by Ti can not be
subsequently be relocked by Ti.
Advantages:
1. Unlocking may access earlier that leads to shorter waiting time and to
increase concurrency.
2. Protocol is deadlock free, no roll backs are required.
Disadvantages:
1. 1. Locking results in increased locking
2. 2. Additional waiting time
3. 3. Potential decrease in concurrency
6.4. Time-stamp based protocol
These types of locking protocol, we use for ordering between every pairs of
conflicting transaction is determines at execution time.
Time-stamp:
56. With each transaction Ti in the system, we associated a unique fixed time stamp
denoted by TS (Ti).
This time stamp assigned by the database system before the transaction Ti starts
execution.
If TS(Ti) → T0 transaction Ti
New entered transaction Ti TS(Tj)There are two simple methods for
implementing this scheme.
1. Use the value of system clock as the time stamp. That’s a transaction time
stamp is equal to the value of the clock when the transaction enters the
system.
2. Use a logical counter that is incremented after a new time-stamp has been
assigned. Transaction time-stamp is equal to the value of the counter.
6.5. Validation based protocol
In some cases, where the majority of the transactions are read-only transaction,
rate of conflicts among transaction may be low. But we do not know in advance
which transaction will be involved in a conflict. To gain that we need to scheme
for monitoring the system, we assume that each transaction Ti executes in two
phases.
1. Read phase: during this phase, the execution of transaction Ti takes
place, the value of the various data item are read and are stored in
variable local to Ti. All write operations are performed on temporary local
variable, without updating the actual database.
2. Validation phase: transaction Ti performs a validation test to determine
whether it can copy to the database. The temporary local variable that
holds the result of write operation without causing a violation of
serializability.
3. if transaction Ti succeeds in validation ( step 2). Then the actual updates
are applied to the database. Otherwise Ti is roll back.
a. Start (Ti), the time when Ti started its execution.
b. Validation (Ti), the time when Ti finished read phase and started its
validation phase.
c. Finish (Ti), the time, when Ti finished its write phase.
6.6. Recovery system
6.6.1. Failure Classification
There are various types of failure that occurs in the system. Each of which deals
with in a different manners.
Simple failure: does not loss of information in a system.
57. Difficult failure: of information in a system.
Here we consider only the following types of failure:
6.6.1.1. Transaction failure
There are two types of error that may cause transaction to fail.
Logical error: transaction can no longer proceed with its normal execution. Due
to such as bad input, data not found, overflow or resource limit exceeded.
System errors: The system has entered in undesirable state ( deadlock) as a
result of which a transaction cannot continue with its normal execution. For this
transaction re-execute after.
System crash: Such as bug in the database software, operating system fails,
that causes loss of contents of volatile storage.
Disk failure: Disk blocks loses its contents, either head crash. To recover this
types of failure , tapes are used.
6.6.2. Log based recovery:
The most widely structure for recording database modification is the log.
The log is a sequence of log records and maintains a record of all the update in
the database.
Log records having the following fields:
Transaction identifier:
It is a unique identifier of the transaction that performs write operation.
Data item identifier:
It is identifier of the data item. Basically it is the location of the data item on the
disk.
Old value:
Value of the data item prior to the write operation.
New value:
Value of the data item will have after the write operation.
Log record exist to record significant events during transaction processing.
< Ti, start> transaction Ti has started.
<Ti,Xj,V1,V2> transaction Ti performed write operation on the data item Xj,
has the value V1 before the write , will has V2 after the write.
<Ti, commit> transaction Ti has committed.
<Ti, abort> transaction ti has aborted.
58. 6.7. Deferred Database Modification
In this scheme, when a transaction is partially commits, the information on the log
associated with the transaction is used in executing the deferred writes. If the
system crashes before the transaction completes. Its execution or if the
transaction aborts then the information on the log is simply ignored.
T0:
READ(A)
A:=A-50;
WRITE(A)
READ(B)
B:=B+50;
WRITE(B)
T1:
READ(C)
C:=C-100;
WRITE(C)
6.8. Immediate Database Modification
The immediate update technique allows database modification to be output to the
database while the transaction still in the active state.
Database modifications written by active transaction are called uncommitted
modification. In the event of a crash or transaction failure, the system must use
the old value field of the log records to restore the modified data item.
< T0, start>
< T0,A,1000,950>
59. < T0,B,2000,2050>
< T0,commit>
< T1, start>
< T1,C,700,600>
< T1,commit>
7. CENTRALIZED AND DISTRIBUTED DATABASE
In the traditional enterprise computing model, an Information Systems
department maintains control of a centralized corporate database system.
Mainframe computers, usually located at corporate headquarters, provide the
required performance levels. Remote sites access the corporate database
through wide-area networks (WANs) using applications provided by the
Information Systems department.
Changes in the corporate environment toward decentralized operations have
prompted organizations to move toward distributed database systems that
complement the new decentralized organization.
Today’s global enterprise may have many local-area networks (LANs) joined with
a WAN, as well as additional data servers and applications on the LANs. Client
applications at the sites need to access data locally through the LAN or remotely
through the WAN. For example, a client in Tokyo might locally access a table
stored on the Tokyo data server or remotely access a table stored on the New
York data server.
Both centralized and distributed database systems must deal with the problems
associated with remote access:
• Network response slows when WAN traffic is heavy. For example, a
mission-critical transaction-processing application may be adversely
affected when a decision-support application requests a large number of
rows.
• A centralized data server can become a bottleneck as a large user
community contends for data server access.
• Data is unavailable when a failure occurs on the network.
7.1. Distributed Database System
A distributed database system is a collection of data that belongs logically to the
same system but is physically spread over the sites of a computer network.
7.2. Some advantages of the DDBMS are as follows:
1. Distributed nature of some database application: some database
application arte naturally distributed over the different sites.
60. 2. Increased reliability and availability: there are two most common
advantages for any database. Reliability is broadly defined as the
probability that a system is up at a particular moments. Availability is the
probability that the system is continuously available during a time interval.
3. Allowing data sharing while maintaining some measures of local
controls: it is possible to control the data & software locally at each site.
However the certain data can be accessed by users at other remote site
through the DBMS software. This allows the controlled sharing of data
through out the distributed system.
4. Improved performance: when a large data is distributed over the multiple
sites, smaller data base exist at each site. As a result, local queries &
transaction accessing data at a single site have better performance
because of the smaller local database. If all the transaction are submitted
to a single centralized database, than the performance will be decreased.
7.3. Some additional properties:
1. The ability to access remote sites and transmit queries and data among
the various sites via a communication network.
2. The ability to decide on which copy of a replicated data item to access.
3. The ability to maintain the consistency of copies of a replicated data item.
4. The ability to recover from individual site crashes and from new types of
failure such as the failure of the communication links.
7.4. Physical hardware level
The following main factors distinguish a DDBMS from a centralized system:
1. There are multiple computers called site or nodes.
2. These sites must be connected by some types of communication network
to transmit data and command among the site.
The site may be within the same building or group of adjacent building via local
area network or they may be geographically distributed over the large distance
and connected via a long haul network. Local area network typically uses cables.
Whereas long haul network use telephone lines or satellites it is also possible to
use a combination of the two types of network. Networks may have different
topologies that define the different communication among sites.
61. 7.5. Client Server Architecture
The client server architecture has been developed to deal with new computing
environment in which a large no. of personal computers, workstations, file
servers, peripherals and others equipments are connected together via a
network. The idea is to define specialized covers with specific functionalities.
The instruction between client and server might proceed as follows during
processing of an SQL query.
1. The client passes a users query and decomposition it into a number of
independent site queries. Each site query is sent to the appropriate
receiver site.
2. Each server processes the local query and sends the resulting relation to
the client site.
3. The client site combines the result of the sub queries to improve the result
of the originally submitted query. In this approach SQL server has called a
database processor (DP) or a back-end machine whereas the client has
been called as application processor (AP) or front-end machine.
The DDBMS, it is to divide the software modules into the three levels.
1. The server software is responsible for local data management at a site.
2. The client software is responsible for most of the distribution function. It
accesses the data distribution information from the DDBMS catalog and
processes all request that require access to more than one site.
3. The communication software provides the communication primitives that
are used by the client to transmit command and data among the various
sites as needed.
62. 7.6. Data fragmentation
If relation r is fragmented, r is divided into a number of fragments r1,
r2……………rn. These fragments contains the sufficient information to allow
reconstruction of the original relation r. this reconstruction can take place through
the application of either the union operation or special types of join operation on
the various fragments.
There are three different types of schemes for fragmenting a relation:
I. Horizontal fragmentation
II. Vertical fragmentation
III. Mixed fragmentation
7.6.1. Horizontal fragmentation
In this each tuple of r is fragment into one or more fragments horizontally. A
relation r is partitioned into a number of subsets r1, r2……………rn. Each tuple of
the relation r must belong to at least one of the fragments so that the original
relation can be reconstructed. These fragments can be defined as a selection
operation.
For reconstruction we uses union operation
R=r1Ur2U……………….rn
7.6.2. Vertical fragmentation
In this each column of r is fragment into one or more fragments vertically. Vertical
fragmentation r(R) involves the subset of attributes R1,R2…………..Rn of the
schema R such that
R=R1UR2U……………….Rn
Each fragments of r is defined by project operation
For reconstruction we uses join operation
R=r1×r2×……………….rn
7.6.3. Mixed fragmentation
Either the horizontal fragments or vertical fragments. A relation r is divided into a
number of fragments R1,R2…………..Rn. each fragments is obtained as the
result of applying either the horizontal fragmentation or vertical fragmentation
scheme on relation r or a fragments of r which was obtained previously.
7.7. Data Replication
If r relation is replicated, a copy of relation r is stored in two or more sites. If we
have full replication in which a copy is stored in every site in the system.
Availability: If one site fails then the relation may found on the other site. This
system may continue the process.
63. Increased parallelism: where the majority of access to the relation r result in
only the reading the relation. Then the several sites can process the queries
involving r in parallel. Then there is the chance that needed data is found when
the transaction is executing.
Increased overhead on update:
The system must ensure that all replicas of a relation r are consistent; otherwise
error ness computations may result. Whenever r is updated, the update must be
propagating to all sites containing replicas.
7.8. Deadlock handling
A system is in deadlock state if there exist a set of transaction such that every
transaction in a set is waiting for the transaction in the set.
Suppose a set of waiting transaction { T0,T1………Tn}
T0 is waiting for a data item held by T1
.
.
.
.
Tn is waiting for a data item held by T0
No any transaction can make progress in this situation.
There are two principal methods for dealing with deadlock problems:
a. Deadlock prevention: this protocol ensures that system will never enter
in deadlock state.
b. Deadlock detection and recovery: we allow a system to enter in
deadlock state and then they try to recover.
7.8.1. Deadlock prevention
There are two approaches to deadlock prevention
Approach1:
i. No cyclic waits can occurs
ii. All locks to be acquired together.
Approach2:
i. This approach is closer to deadlock recovery
ii. We rollback transactions instead of waiting for deadlock under the first
approach
7.8.1.1. The first approach
Each transaction locks all the data item before it begin it execution.
Disadvantages:
i. It is often hard to predict, before the transaction begins, what data item
need to be locked.
64. ii. Data item utilization will be very low, since many of data items may be
locked but unused for a long time.
7.8.1.2. The second approach
For preventing the deadlock is to use preemption method and transaction
rollback.
In preemption:
T2 request lock by T1
The lock granted to T1 may be preempted by roll backing back of T1 and
granting of lock to T2.
To control preemption we assign a unique time stamp to each transaction. The
system uses these time stamp only to decide whether a transaction should wait
or roll back.
Two different deadlock prevention schemes are proposed:
1. wait –die: this scheme is based on a non-preemption techniques. Ti
request a data item held by Tj
ti is allowed to wait if
Ti( time stamp)< Tj ( time stamp)
2. wound-wait: preemption techniques and is a counter part to the wait-die
scheme.
Ti request a data item held by tj
Ti is allow to wait only if
Ti( time stamp) >Tj ( time stamp)
7.8.1.3. Time –out based scheme
Another simple technique is based on the lock time outs.
In this approach, the transaction that has requested a lock waits for at most a
specified amount of time. If the lock has not been granted within that time, the
transaction is said to be time out and it rolled back itself and restarts.
Disadvantages:
i. One or more transactions involved in deadlock.
ii. Short a wait result in transaction rollback, even there is no deadlock.
iii. Leading to wasted resources.
iv. Starvation is also possibility with this scheme.
7.8.2. Deadlock detection and recovery
i. If a system does not employ that ensures deadlock freedom, then a
detection & recovery scheme must be used.
ii. In this schemes system determines
iii. Whether a deadlock has occurred, if one has system must attempt to
recover from the deadlock.
65. To do this, system must
i. Maintains information about the current allocation of data item to
transaction as well as resulting data item.
ii. Develop an algorithm that uses this information to determine whether the
system has entered a deadlock state.
iii. Recover from deadlock, if deadlock exists.
7.8.2.1. Deadlock detection
To describe deadlock we use directed graph called wait-for graph.
Graph consist
G=(V,E)
V→set of vertices (all the transaction in the system)
E→set of edges
7.8.2.1.1. Directed graph
Ti→Tj
Ti is waiting for transaction Tj to release data item that it needs. A deadlock
exists in a system if and only if the wait –for graph contains a cycle. Each
transaction in the cycle is said to be deadlocked. To detect deadlock, the system
maintains the wait-for graph and there search the cycle in the graph.
66. 7.8.2.2. Recovery from the deadlock
When system determines that a deadlock exists. The system must recover from
deadlock. The most common solution is to roll back one or more transaction to
break deadlock. The following actions need to be taken:
1. Select a victim: which one transaction is to be rollback.
a. How long the transaction has completed the task.
b. How many data items the transaction has used.
c. How many more data item the transaction needs for it to complete.
d. How many transactions will be involved in the rollback
2. Rollback: once we have decided that the particular transaction must be
roll back. We must determine how far this transaction should be rolled
back. But for these methods, system requires to maintain the information
about the state of all running transaction.
3. Starvation: when a system determines that a particular transaction never
completes its designated task. This situation is called starvation.
8. SQL (STRUCTURED QUERY LANGUAGE)
SQL (Structured Query Language) is a database sublanguage for querying and
modifying relational databases. It was developed by IBM Research in the mid
70's and standardized by ANSI in 1986.
The Relational Model defines two root languages for accessing a relational
database -- Relational Algebra and Relational Calculus. Relational Algebra is a
low-level, operator-oriented language. Creating a query in Relational Algebra
involves combining relational operators using algebraic notation. Relational
Calculus is a high-level, declarative language. Creating a query in Relational
Calculus involves describing what results are desired.
SQL is a version of Relational Calculus. The basic structure in SQL is the
statement. Semicolons separate multiple SQL statements.
8.1. DDL Statements
DDL stands for data definition language. DDL statements are SQL Statements
that define or alter a data structure such as a table.
DDL statements are used to define the database structure or schema. Some
examples:
• CREATE - to create objects in the database