Unit 5 rdbms study_material

D.GAYA, Assistant Professor, Department of Computer Science, PUCC.
Page1
Pondicherry University Community College
Department of Computer Science
Course : B.Voc [Software Development]
Year : II
Semester : III
Subject : Relational DataBase Management System
Unit V Study Material
Prepared by
D.GAYA
Assistant Professor,
Department of Computer Science,
Pondicherry University Community College,
Lawspet, Puducherry-08.

Page2
Unit V
Locking Techniques – Time stamp ordering – Validation techniques – Granularity of data
items – Recovery Concepts – Log based Recovery – Database Security issues – Access
Control – Statistical Database Security.
Locking Techniques
Introduction
A lock is a data variable which is associated with a data item. This lock signifies that
operations that can be performed on the data item. Locks help synchronize access to the
database items by concurrent transactions.
All lock requests are made to the concurrency-control manager. Transactions proceed
only once the lock request is granted.
Binary Locks:
A Binary lock on a data item can either locked or unlocked states.
Shared/exclusive:
This type of locking mechanism separates the locks based on their uses. If a lock is
acquired on a data item to perform a write operation, it is called an exclusive lock.
1. Shared Lock (S):
A shared lock is also called a Read-only lock. With the shared lock, the data item can
be shared between transactions. This is because you will never have permission to update
data on the data item.
For example, consider a case where two transactions are reading the account balance
of a person. The database will let them read by placing a shared lock. However, if another
transaction wants to update that account's balance, shared lock prevent it until the reading
process is over.
2. Exclusive Lock (X):
With the Exclusive Lock, a data item can be read as well as written. This is exclusive
and can't be held concurrently on the same data item. X-lock is requested using lock-x
instruction. Transactions may unlock the data item after finishing the 'write' operation.
For example, when a transaction needs to update the account balance of a person. You
can allows this transaction by placing X lock on it. Therefore, when the second transaction
wants to read or write, exclusive lock prevent this operation.

Page3
3. Simplistic Lock Protocol
This type of lock-based protocols allows transactions to obtain a lock on every object
before beginning operation. Transactions may unlock the data item after finishing the 'write'
operation.
4. Pre-claiming Locking
Pre-claiming lock protocol helps to evaluate operations and create a list of required
data items which are needed to initiate an execution process. In the situation when all locks
are granted, the transaction executes. After that, all locks release when all of its operations are
over.
Starvation
Starvation is the situation when a transaction needs to wait for an indefinite period to
acquire a lock. Following is the reasons for Starvation:
• When waiting scheme for locked items is not properly managed
• In the case of resource leak
• The same transaction is selected as a victim repeatedly
Deadlock
Deadlock refers to a specific situation where two or more processes are waiting for
each other to release a resource or more than two processes are waiting for the resource in a
circular chain.
Locking protocols are used in database management systems as a means of
concurrency control. Multiple transactions may request a lock on a data item simultaneously.
Hence, we require a mechanism to manage the locking requests made by transactions. Such a
mechanism is called as Lock Manager.
It relies on the process of message passing where transactions and lock manager
exchange messages to handle the locking and unlocking of data items.
Data structure used in Lock Manager
The data structure required for implementation of locking is called as Lock table.
• It is a hash table where name of data items are used as hashing index.
• Each locked data item has a linked list associated with it.
• Every node in the linked list represents the transaction which requested for lock, mode of
lock requested (mutual/exclusive) and current status of the request (granted/waiting).
• Every new lock request for the data item will be added in the end of linked list as a new node.
• Collisions in hash table are handled by technique of separate chaining.
Consider the following example of lock table:

Page4
In the above figure, the locked data items present in lock table are 5, 47, 167 and 15.
The transactions which have requested for lock have been represented by a linked list shown
below them using a downward arrow.
Each node in linked list has the name of transaction which has requested the data item like
T33, T1, T27 etc.
The colour of node represents the status i.e. whether lock has been granted or waiting.
Note that a collision has occurred for data item 5 and 47. It has been resolved by separate
chaining where each data item belongs to a linked list. The data item is acting as header for
linked list containing the locking request.
Working of Lock Manager
Initially the lock table is table empty as no data item is locked.
• Whenever lock manger receives a lock request from a transaction Ti on a particular data
item Qi following cases may arise:
• If Qi is not already locked, a linked list will be created and lock will be granted to the
requesting transaction Ti.
• If the data item is already locked, a new node will be added at the end of its linked list
containing the information about request made by Ti.
• If the lock mode requested by Ti is compatible with lock mode of transaction currently
having the lock, Ti will acquire the lock too and status will be changed to ‘granted’. Else,
status of Ti’s lock will be ‘waiting’.
• If a transaction Ti wants to unlock the data item it is currently holding, it will send an unlock
request to the lock manager. The lock manger will delete Ti’s node from this linked list. Lock
will be granted to the next transaction in the list.
Sometimes transaction Ti may have to be aborted. In such a case all the waiting request made
by Ti will be deleted from the linked lists present in lock table. Once abortion is complete,
locks held by Ti will also be released.

Page5
Timestamp based Concurrency Control
Concurrency Control can be implemented in different ways. One way to implement it
is by using Locks. Now, lets discuss about Time Stamp Ordering Protocol.
As earlier introduced, Timestamp is a unique identifier created by the DBMS to
identify a transaction. They are usually assigned in the order in which they are submitted to
the system. Refer to the timestamp of a transaction T as TS(T). For basics of Timestamp you
may refer here.
Timestamp Ordering
The main idea for this protocol is to order the transactions based on their Timestamps.
A schedule in which the transactions participate is then serializable and the only equivalent
serial schedule permitted has the transactions in the order of their Timestamp Values. Stating
simply, the schedule is equivalent to the particular Serial Order corresponding to the order of
the Transaction timestamps.
Algorithm must ensure that, for each items accessed by Conflicting Operations in the
schedule, the order in which the item is accessed does not violate the ordering. To ensure this,
use two Timestamp Values relating to each database item X.
• W_TS(X) is the largest timestamp of any transaction that executed write(X)successfully.
• R_TS(X) is the largest timestamp of any transaction that executed read(X)successfully.
Basic Timestamp Ordering
Every transaction is issued a timestamp based on when it enters the system. Suppose,
if an old transaction Ti has timestamp TS(Ti), a new transaction Tj is assigned timestamp
TS(Tj) such that TS(Ti) < TS(Tj).The protocol manages concurrent execution such that the
timestamps determine the serializability order.
The timestamp ordering protocol ensures that any conflicting read and write
operations are executed in timestamp order. Whenever some Transaction T tries to issue a
R_item(X) or a W_item(X), the Basic TO algorithm compares the timestamp
of T with R_TS(X) & W_TS(X) to ensure that the Timestamp order is not violated.
This describe the Basic TO protocol in following two cases.
Whenever a Transaction T issues a W_item(X) operation, check the following conditions:
• If R_TS(X) > TS(T) or if W_TS(X) > TS(T), then abort and rollback T and reject the operation.
else,
• Execute W_item(X) operation of T and set W_TS(X) to TS(T).
Whenever a Transaction T issues a R_item(X) operation, check the following conditions:

Page6
• If W_TS(X) > TS(T), then abort and reject T and reject the operation, else
• If W_TS(X) <= TS(T), then execute the R_item(X) operation of T and set R_TS(X) to the larger
of TS(T) and current R_TS(X).
Whenever the Basic TO algorithm detects twp conflicting operation that occur in incorrect
order, it rejects the later of the two operation by aborting the Transaction that issued it.
Schedules produced by Basic TO are guaranteed to be conflict serializable. Already discussed
that using Timestamp, can ensure that our schedule will be deadlock free.
One drawback of Basic TO protocol is that it Cascading Rollback is still possible.
Suppose we have a Transaction T1 and T2 has used a value written by T1. If T1 is aborted
and resubmitted to the system then, T must also be aborted and rolled back. So the problem
of Cascading aborts still prevails.
Let’s gist the Advantages and Disadvantages of Basic TO protocol:
Timestamp Ordering protocol ensures serializablity since the precedence graph will be of the
form:
Image – Precedence Graph for TS ordering
Timestamp protocol ensures freedom from deadlock as no transaction ever waits.
But the schedule may not be cascade free, and may not even be recoverable.
Strict Timestamp Ordering
A variation of Basic TO is called Strict TO ensures that the schedules are both Strict
and Conflict Serializable. In this variation, a Transaction T that issues a R_item(X) or
W_item(X) such that TS(T) > W_TS(X) has its read or write operation delayed until the
Transaction T‘that wrote the values of X has committed or aborted.
VALIDATION Techniques

Page7
In optimistic concurrency control techniques, also known as validation or certification
techniques, no checking is done while the transaction is executing. In this scheme, updates in
the transaction are not applied directly to the database items until the transaction reaches its
end.
During transaction execution, all updates are applied to local copies of the data items
that are kept for the transaction (Note 6). At the end of transaction execution, a validation
phase checks whether any of the transaction’s updates violate serializability.
Certain information needed by the validation phase must be kept by the system. If
serializability is not violated, the transaction is committed and the database is updated from
the local copies; otherwise, the transaction is aborted and then restarted later.
There are three phases for this concurrency control protocol:
1. Read phase: A transaction can read values of committed data items from
the database. However, updates are applied only to local copies (versions) of
the data items kept in the transaction workspace.
2. Validation phase: Checking is performed to ensure that serializability will
not be violated if the transaction updates are applied to the database.
3. Write phase: If the validation phase is successful, the transaction updates
are applied to the database; otherwise, the updates are discarded and the
transaction is restarted.
The idea behind optimistic concurrency control is to do all the checks at once; hence,
transaction execution proceeds with a minimum of overhead until the validation phase is
reached. If there is little interference among transactions, most will be validated successfully.
However, if there is much interference, many transactions that execute to completion
will have their results discarded and must be restarted later. Under these circumstances,
optimistic techniques do not work well.
The techniques are called "optimistic" because they assume that little interference will
occur and hence that there is no need to do checking during transaction execution.
The optimistic protocol we describe uses transaction timestamps and also requires that
the write_sets and read_sets of the transactions be kept by the system.
In addition, start and end times for some of the three phases need to be kept for each
transaction. Recall that the write_set of a transaction is the set of items it writes, and
the read_set is the set of items it reads. In the validation phase for transaction Ti, the protocol
checks that Ti does not interfere with any committed transactions or with any other
transactions currently in their validation phase.
The validation phase for Ti checks that, for each such transaction Tjthat is either
committed or is in its validation phase, one of the following conditions holds:

Page8
1. Transaction Tj completes its write phase before Ti starts its read phase.
2. Ti starts its write phase after Tj completes its write phase, and the read_set
of Ti has no items in common with the write_set of Tj.
3. Both the read_set and write_set of Ti have no items in common with the
write_set of Tj, and Tj completes its read phase before Ti completes its read
phase.
When validating transaction Ti, the first condition is checked first for each transaction
Tj, since (1) is the simplest condition to check. Only if condition (1) is false is condition (2)
checked, and only if (2) is false is condition (3)—the most complex to evaluate—checked. If
any one of these three conditions holds, there is no interference and Ti is validated
successfully.
If none of these three conditions holds, the validation of transaction Ti fails and it is
aborted and restarted later because interference may have occurred.
Granularity of Data Items
All concurrency control techniques assume that the database is formed of a number of
named data items. A database item could be chosen to be one of the following:
1. A database record
2. A field value of a database record
3. A disk block
4. A whole file
5. The whole database
The granularity can affect the performance of concurrency control and recovery.
Granularity Level Considerations for Locking
The size of data items is often called the data item granularity. Fine granularity refers
to small item sizes, whereas coarse granularity refers to large item sizes. Several tradeoffs
must be considered in choosing the data item size. We will discuss data item size in the
context of locking, although similar arguments can be made for other concurrency control
techniques.
First, notice that the larger the data item size is, the lower the degree of concurrency
permitted. For example, if the data item size is a disk block, a transaction T that needs to lock
a record B must lock the whole disk block X that contains B because a lock is associated with
the whole data item (block). Now, if another transaction S wants to lock a different
record C that happens to reside in the same block X in a conflicting lock mode, it is forced to

Page9
wait. If the data item size was a single record, transaction S would be able to proceed,
because it would be locking a different data item (record).
On the other hand, the smaller the data item size is, the more the number of items in
the database. Because every item is associated with a lock, the system will have a larger
number of active locks to be handled by the lock manager. More lock and unlock operations
will be performed, causing a higher overhead. In addition, more storage space will be
required for the lock table. For timestamps, storage is required for
theread_TS and write_TS for each data item, and there will be similar overhead for handling
a large number of items.
Given the above tradeoffs, an obvious question can be asked: What is the best item
size? The answer is that itdepends on the types of transactions involved. If a typical
transaction accesses a small number of records, it is advantageous to have the data item
granularity be one record. On the other hand, if a transaction typically accesses many records
in the same file, it may be better to have block or file granularity so that the transaction will
consider all those records as one (or a few) data items.
Database security issues
The issues that arise in determining the security specification and implementation of a
database system.
Access to key fields
Suppose you have a user role with access rights to table A and to table C but not to
table B. The problem is that the foreign key in C includes columns from B. The following
questions arise:
Figure 12.4
Do you have access to the foreign key in C?
If you do, you know at least that a tuple exists in B and you know some information
about B that is restricted from you.

Page10
Can you update the foreign key columns?
If so, it must cascade, generating an update to B for which no privileges have been
given.
These problems do not directly arise where the database is implemented by internal
pointers - as a user, you need have no knowledge of the relationships between the data you
are accessing. They arise because relationships are data values. Often, knowing the foreign
key will not be sensitive in itself. If it is, then the definition of a view may solve the problem.
Access to surrogate information
It is not difficult to conceive of cases where the view of the data provided to a user
role extends to the external world. An example should make the problem clear.
In a retail environment, there are frequent problems with pilferage. To deal with these,
private detectives work undercover. They are to all intents and purposes employees of the
business and assigned to normal business activities as other members of staff. They get pay
checks or slips at the same time as everyone else, they appear in management information
(such as the salary analysis) in the same manner.
They have a job title and participate in the system as someone they are not. The store
manager is unaware of the situation, as is everybody else except the corporate security
manager.
You may want to handle these situations on separate databases. As a solution it may
be appropriate, but the larger the problem the more scope there is for confusion. One
suggested solution is the polyinstantiation of tuples - one individual is represented by more
than one tuple. The data retrieved will depend on the security classification of the user.
Tuples will have the same apparent primary key but different actual primary keys, and all
applications will need to be carefully integrated with the security system.
Problems with data extraction
Where data access is visualised directly, the problem can be seen clearly enough: it is
to ensure that authenticated users can access only data items which they are authorised to use
for the purpose required. When the focus shifts from the data to the implications that can be
drawn from that data, more problems arise.
Again, an example should make things clear.
You want to know the pay of the chief executive. You have access rights to the table,
except for the MONTHLY-PAY field in this tuple. So you issue an SQL query SUM
(MONTHLY-PAY) across the whole table. You then create a view SELECT MONTHLY-
PAY ... and issue a SUM on this view. Should you get the same answer in both cases?
If not, you can achieve your objective by subtracting the two sums. If you listed the
monthly pay for all, what would you expect to see - all the tuples except the one restricted?

Page11
Would you expect to be notified by asterisks that data was missing which you were not
allowed to see?
Access control in SQL
This section is about the implementation of security within SQL. The basics are given
in SQL-92 but, as you will realise, much security is DBMS- and hardware-specific. Where
necessary, any specifics are given in the SQL of Oracle. For some ideas on Object database
management systems (ODBMS) as distinct from Relational, refer to the later chapter on
Object databases.
Your first objective is to learn the specifics. The access requirements specification
will be implemented using these statements. Your second objective is to extend your
understanding of the problem through to the management and audit functions of an operating
system.
The basic statements come first, and the management functions are discussed second.
In the first part you will learn the SQL needed to manage a user; in the second you will learn
a little of the SQL to manage a system.
Discretionary security in SQL
This section introduces the SQL statements needed to implement access control. You
should aim at having sufficient knowledge of this area of SQL to translate a simple
specification into an SQL script. You should also be conscious of the limitations implicit in
this script which hardwires passwords into text.
The basics of SQL are inherently discretionary. Privileges to use a database resource
are assigned and removed individually.
The first issue is who is allowed to do what with the security subsystem. You need to
have a high level of privilege to be able to apply security measures. Unfortunately, such roles
are not within the SQL standard and vary from DBMS to DBMS. A role is defined as a
collection of privileges.
As an example, the supplied roles in Oracle include (among others):
SYSOPER: Start and stop the DBMS.
DBA: Authority to create users and to manage the database and existing users.
SYSDBA: All the DBAâ€™s authority plus the authority to create, start, stop and recover.
The role of the DBA has been covered in other chapters. The point here is that you realise
there are a large number of predefined roles with different privileges and they need to be
controlled. It is important to be certain that the SQL defaults do not act in ways you do not
anticipate.
Schema level

Page12
The first security-related task is to create the schema. In the example below, the
authorisation is established with the schema. The authorisation is optional and will default to
the current user if it is not specified.
Only the owner of the schema is allowed to manipulate it. Below is an example where
a user is given the right to create tables. The creator of the table retains privileges for the
tables so created. Similarly, synonyms are only valid for the creator of that synonym.
CREATE SCHEMA student_database AUTHORISATION U1;
The U1 refers to the authorisation identifier of the user concerned, who has to have
the right to create database objects of this type â€“ in this case, the schema for a new database.
Provided the authorisation is correct, then the right to access the database using the
schema can be granted to others. So to allow the creation of a table:
GRANT CREATETAB TO U1 ;
The topic of schema modifications will not be taken up here.
Authentication
Using the client/server model (see chapter 15), it is necessary first to connect to the
database management system, effectively establishing both authentication and the complex
layers of communication between the local (client DBMS) and the server.
GRANT CONNECT TO student_database AS U1,U2,U3 IDENTIFIED BY P1,P2,P3;
U1,U2,U3 are user names, P1,P2,P3 are passwords and student_database is the database
name.
GRANT CONNECT TO student_database AS U4/P4 ;
Connect rights give no permission for any table within the database. U4/P4 are the identifiers
known to this database security services.
Note
• Users, roles and privilege levels can be confusing. The following are the key distinctions:
• A user is a real person (with a real password and user account).
• A role, or a user-role, is a named collection of privileges that can be easily assigned to a
given or new user. A privilege is a permission to perform some act on a database object.
• A privilege level refers to the extent of those privileges, usually in connection with a
database-defined role such as database administrator.
Table level
The authority level establishes some basic rights. The SYSDBA account has full
rights and can change everything. Rights to access tables have to be GRANTed separately by
the DBA or SYSADM.

Page13
The following example assigns a read privilege to a named table (note only a read
privilege). The privilege extends to creating a read-only view on the table:
GRANT SELECT ON TABLE1 TO U1;
And that which may be given can be removed. REVOKE is used generally to remove
any specific privilege.
REVOKE SELECT ON TABLE1 FROM U1;
The main part of this aspect of security, though, is providing access to the data. In a
Relational database we have only one data structure to consider, so if we can control access
to one table we can control access to all. And as tables are two dimensional, if we can control
access to rows and columns, we can deal with any request for data â€“ including schema data.
We still have to know what is allowed and what is not but, given the details, the
implementation is not in itself a problem.
Remember that a VIEW is created by an SQL SELECT, and that a view is only a
virtual table. Although not part of the base tables, it is processed and appears to be
maintained by the DBMS as if it were.
To provide privileges at the level of the row, the column or by values, it is necessary
to grant rights to a view.
CREATE VIEW VIEW1
AS SELECT A1, A2, A3
FROM TABLE1
WHERE A1 < 20000;
â€˜and the privilege is now assignedâ€™
GRANT SELECT ON VIEW1 TO U1
WITH GRANT OPTION;
The optional â€œwith grant optionâ€• allows the user to assign privileges to other
users. This might seem like a security weakness and is a loss of DBA control. On the other
hand, the need for temporary privileges can be very frequent and it may be better that a user
assign temporary privileges to cover for an office absence, than divulge a confidential
password and user-id with a much higher level of privilege.
The rights to change data are granted separately:
GRANT INSERT ON TABLE1 TO U2, U3;
GRANT DELETE ON TABLE1 TO U2, U3;

Page14
GRANT UPDATE ON TABLE1(salary) TO U5;
GRANT INSERT, DELETE ON TABLE1 TO U2, U3;
Notice in the update, that the attributes that can be modified are specified by column name.
The final form is a means of combining privileges in one expression.
To provide general access:
GRANT ALL TO PUBLIC;
SQL system tables
The DBMS will maintain tables to record all security information. An SQL database
is created and managed by the use of system tables. These comprise a relational database
using the same structure and access mechanism as the main database.
Recovery Concepts
DATABASE RECOVERY
There can be any case in database system like any computer system when database
failure happens. So data stored in database should be available all the time whenever it is
needed.
So Database recovery means recovering the data when it get deleted, hacked or
damaged accidentally. Atomicity is must whether is transaction is over or not it should reflect
in the database permanently or it should not effect the database at all.
Classification of failure:
To see wherever the matter has occurred, we tend to generalize a failure into
numerous classes, as follows:
• Transaction failure
• System crash
• Disk failure

Page15
Transaction failure:
A transaction needs to abort once it fails to execute or once it reaches to any further
extent from wherever it can’t go to any extent further. This is often known as transaction
failure wherever solely many transactions or processes are hurt. The reasons for transaction
failure are:
Logical errors:
Where a transaction cannot complete as a result of its code error or an internal error
condition.
System errors:
Wherever the information system itself terminates an energetic transaction as a result
of the DBMS isn’t able to execute it, or it’s to prevent due to some system condition. to
Illustrate, just in case of situation or resource inconvenience, the system aborts an active
transaction.
System crash:
There are issues − external to the system − that will cause the system to prevent
abruptly and cause the system to crash. For instance, interruptions in power supply might
cause the failure of underlying hardware or software package failure. Examples might include
OS errors.
Disk failure:
In early days of technology evolution, it had been a typical drawback wherever hard-
disk drives or storage drives accustomed to failing oftentimes. Disk failures include the

Page16
formation of dangerous sectors, unreachability to the disk, disk crash or the other failure, that
destroys all or a section of disk storage.
Storage structure:
Classification of storage structure is as explained below:
Volatile storage:
As the name suggests, a memory board (volatile storage) cannot survive system
crashes. Volatile storage devices are placed terribly near to the CPU; usually, they’re
embedded on the chipset itself. For instance, main memory and cache memory are samples of
the memory board. They’re quick however will store a solely little quantity of knowledge.
Non-volatile storage:
These recollections are created to survive system crashes. they’re immense in
information storage capability, however slower in the accessibility. Examples could include
hard-disks, magnetic tapes, flash memory, and non-volatile (battery backed up) RAM.
Recovery and Atomicity:
When a system crashes, it should have many transactions being executed and
numerous files opened for them to switch the information items. Transactions are a product
of numerous operations that are atomic in nature.
However consistent with ACID properties of a database, atomicity of transactions as
an entire should be maintained, that is, either all the operations are executed or none.
When a database management system recovers from a crash, it ought to maintain the
subsequent:
• It ought to check the states of all the transactions that were being executed.
• A transaction could also be within the middle of some operation; the database management
system should make sure the atomicity of the transaction during this case.

Page17
• It ought to check whether or not the transaction is completed currently or it must be rolled
back.
• No transactions would be allowed to go away from the database management system in an
inconsistent state.
• There are 2 forms of techniques, which may facilitate a database management system in
recovering as well as maintaining the atomicity of a transaction:
• Maintaining the logs of every transaction, and writing them onto some stable storage before
truly modifying the info.
• Maintaining shadow paging, wherever the changes are done on a volatile memory, and later,
and the particular info is updated.
Log-based recovery Or Manual Recovery):
Log could be a sequence of records, which maintains the records of actions performed
by dealing. It’s necessary that the logs area unit written before the particular modification and
hold on a stable storage media, that is failsafe. Log-based recovery works as follows:
o The log file is unbroken on a stable storage media.
o When a transaction enters the system and starts execution, it writes a log regarding
it.
Recovery with concurrent transactions (Automated Recovery):
When over one transaction is being executed in parallel, the logs are interleaved. At
the time of recovery, it’d become exhausting for the recovery system to go back all logs, and
so begin recovering.
To ease this example, the latest package uses the idea of ‘checkpoints’. Automated
Recovery is of three types
• Deferred Update Recovery
• Immediate Update Recovery
• Shadow Paging
Log based Recovery
Atomicity property of DBMS states that either all the operations of transactions must
be performed or none. The modifications done by an aborted transaction should not be visible
to database and the modifications done by committed transaction should be visible.
To achieve our goal of atomicity, user must first output to stable storage information
describing the modifications, without modifying the database itself.
This information can help us ensure that all modifications performed by committed
transactions are reflected in the database. This information can also help us ensure that no
modifications made by an aborted transaction persist in the database.
Log and log records

Page18
The log is a sequence of log records, recording all the update activities in the database.
In a stable storage, logs for each transaction are maintained. Any operation which is
performed on the database is recorded is on the log. Prior to performing any modification to
database, an update log record is created to reflect that modification.
An update log record represented as: <Ti, Xj, V1, V2> has these fields:
• Transaction identifier: Unique Identifier of the transaction that performed the write
operation.
• Data item: Unique identifier of the data item written.
• Old value: Value of data item prior to write.
• New value: Value of data item after write operation.
Other type of log records are:
<Ti start>: It contains information about when a transaction Ti starts.
<Ti commit>: It contains information about when a transaction Ti commits.
<Ti abort>: It contains information about when a transaction Ti aborts.
Undo and Redo Operations
Because all database modifications must be preceded by creation of log record, the
system has available both the old value prior to modification of data item and new value that
is to be written for data item.
This allows system to perform redo and undo operations as appropriate:
• Undo: using a log record sets the data item specified in log record to old value.
• Redo: using a log record sets the data item specified in log record to new value.
The database can be modified using two approaches
Deferred Modification Technique:
If the transaction does not modify the database until it has partially committed, it is
said to use deferred modification technique.
Immediate Modification Technique:
If database modification occur while transaction is still active, it is said to use
immediate modification technique.
Recovery using Log records
After a system crash has occurred, the system consults the log to determine which
transactions need to be redone and which need to be undone.
• Transaction Ti needs to be undone if the log contains the record <Ti start> but does not
contain either the record <Ti commit> or the record <Ti abort>.

Page19
• Transaction Ti needs to be redone if log contains record <Ti start> and either the record <Ti
commit> or the record <Ti abort>.
Use of Checkpoints –
When a system crash occurs, user must consult the log. In principle, that need to
search the entire log to determine this information. There are two major difficulties with this
approach:
The search process is time-consuming.
Most of the transactions that, according to our algorithm, need to be redone have
already written their updates into the database. Although redoing them will cause no harm, it
will cause recovery to take longer.
To reduce these types of overhead, user introduce checkpoints. A log record of the
form <checkpoint L> is used to represent a checkpoint in log where L is a list of transactions
active at the time of the checkpoint. When a checkpoint log record is added to log all the
transactions that have committed before this checkpoint have <Ti commit> log record before
the checkpoint record. Any database modifications made by Ti is written to the database
either prior to the checkpoint or as part of the checkpoint itself. Thus, at recovery time, there
is no need to perform a redo operation on Ti.
After a system crash has occurred, the system examines the log to find the last
<checkpoint L> record. The redo or undo operations need to be applied only to transactions
in L, and to all transactions that started execution after the record was written to the log. Let
us denote this set of transactions as T. Same rules of undo and redo are applicable on T as
mentioned in Recovery using Log records part.
Note that user need to only examine the part of the log starting with the last
checkpoint log record to find the set of transactions T, and to find out whether a commit or
abort record occurs in the log for each transaction in T.
For example, consider the set of transactions {T0, T1, . . ., T100}. Suppose that the
most recent checkpoint took place during the execution of transaction T67 and T69, while
T68 and all transactions with subscripts lower than 67 completed before the checkpoint. Thus,
only transactions T67, T69, . . ., T100 need to be considered during the recovery scheme.
Each of them needs to be redone if it has completed (that is, either committed or aborted);
otherwise, it was incomplete, and needs to be undone.
Database Security issues
A database security manager is the most important asset to maintaining and securing
sensitive data within an organization. Database security managers are required to multitask
and juggle a variety of headaches that accompany the maintenance of a secure database.

Page20
If you own a business it is important to understand some of the database
security problems that occur within an organization and how to avoid them. If you understand
the how, where, and why of database security you can prevent future problems from
occurring.
Database Security Issues
Daily Maintenance:
Database audit logs require daily review to make certain that there has been no data
misuse. This requires overseeing database privileges and then consistently updating user
access accounts. A database security manager also provides different types of access control
for different users and assesses new programs that are performing with the database. If these
tasks are performed on a daily basis, you can avoid a lot of problems with users that may
pose a threat to the security of the database.
Varied Security Methods for Applications:
More often than not applications developers will vary the methods of security for
different applications that are being utilized within the database. This can create difficulty
with creating policies for accessing the applications. The database must also possess the
proper access controls for regulating the varying methods of security otherwise sensitive data
is at risk.
Post-Upgrade Evaluation:
When a database is upgraded it is necessary for the administrator to perform a post-
upgrade evaluation to ensure that security is consistent across all programs. Failure to
perform this operation opens up the database to attack.
Split the Position:
Sometimes organizations fail to split the duties between the IT administrator and the
database security manager. Instead the company tries to cut costs by having the IT
administrator do everything. This action can significantly compromise the security of the data
due to the responsibilities involved with both positions. The IT administrator should manage
the database while the security manager performs all of the daily security processes.
Application Spoofing:
Hackers are capable of creating applications that resemble the existing applications
connected to the database. These unauthorized applications are often difficult to identify and
allow hackers access to the database via the application in disguise.
Manage User Passwords:

Page21
Sometimes IT database security managers will forget to remove IDs and access
privileges of former users which leads to password vulnerabilities in the database. Password
rules and maintenance needs to be strictly enforced to avoid opening up the database to
unauthorized users.
Windows OS Flaws: Windows operating systems are not effective when it comes to database
security. Often theft of passwords is prevalent as well as denial of service issues. The
database security manager can take precautions through routine daily maintenance checks.
Access control
The purpose of access control must always be clear. Access control is expensive in
terms of analysis, design and operational costs. It is applied to known situations, to known
standards, to achieve known purposes.
Do not apply controls without all the above knowledge. Control always has to be
appropriate to the situation. The main issues are introduced below.
Authentication and authorisation
We are all familiar as users with the log-in requirement of most systems. Access to IT
resources generally requires a log-in process that is trusted to be secure. This topic is about
access to database management systems, and is an overview of the process from the DBA
perspective. Most of what follows is directly about Relational client-server systems. Other
system models differ to a greater or lesser extent, though the underlying principles remain
true.
For a simple schematic, see Authorisation and Authentication Schematic. Among the
main principles for database systems are authentication and authorisation.
Authentication
The client has to establish the identity of the server and the server has to establish the
identity of the client. This is done often by means of shared secrets (either a password/user-id
combination, or shared biographic and/or biometric data).
It can also be achieved by a system of higher authority which has previously
established authentication. In client-server systems where data (not necessarily the database)
is distributed, the authentication may be acceptable from a peer system. Note that
authentication may be transmissible from system to system.
The result, as far as the DBMS is concerned, is an authorisation-identifier.
Authentication does not give any privileges for particular tasks. It only establishes that the
DBMS trusts that the user is who he/she claimed to be and that the user trusts that the DBMS
is also the intended system. Authentication is a prerequisite for authorisation.
Authorisation

Page22
Authorisation relates to the permissions granted to an authorised user to carry out
particular transactions, and hence to change the state of the database (write-item transactions)
and/or receive data from the database (read-item transactions). The result of authorisation,
which needs to be on a transactional basis, is a vector: Authorisation (item, auth-id,
operation). A vector is a sequence of data values at a known location in the system.
How this is put into effect is down to the DBMS functionality. At a logical level, the
system structure needs an authorisation server, which needs to co-operate with an auditing
server. There is an issue of server-to-server security and a problem with amplification as the
authorisation is transmitted from system to system. Amplification here means that the
security issues become larger as a larger number of DBMS servers are involved in the
transaction.
Audit requirements are frequently implemented poorly. To be safe, you need to log all
accesses and log all authorisation details with transaction identifiers. There is a need to audit
regularly and maintain an audit trail, often for a long period.
Access philosophies and management
Discretionary control is where specific privileges are assigned on the basis of specific
assets, which authorised users are allowed to use in a particular way. The security DBMS has
to construct an access matrix including objects like relations, records, views and operations
for each user - each entry separating create, read, insert and update privileges.
This matrix becomes very intricate as authorisations will vary from object to object.
The matrix can also become very large, hence its implementation frequently requires the
kinds of physical implementation associated with sparse matrices. It may not be possible to
store the matrix in the computerâ€™s main memory. At its simplest, the matrix can be
viewed as a two-dimensional table:
When you read a little more on this subject, you will find several other rights that also
need to be recorded, notably the ownersâ€™ rights and the grant right. Mandatory control is
authorisation by level or role. A typical mandatory scheme is the four-level government
classification of open, secret, most secret and top secret. The related concept is to apply
security controls not to individuals but to roles - so the pay clerk has privileges because of the
job role and not because of personal factors.
The database implication is that each data item is assigned a classification for read,
create, update and delete (or a subset of these), with a similar classification attached to each
authorised user. An algorithm will allow access to objects on the basis of less than or equal to

Page23
the assigned level of clearance - so a user with clearance level 3 to read items will also have
access to items of level 0, 1 and 2. In principle, a much simpler scheme. The Bell-LaPadula
model (2005) defines a mandatory scheme which is widely quoted:
• A subject (whether user, account or program) is forbidden to read an object (relation, tuple
or view) unless the security classification of the subject is greater or equal to that of the
object.
• A subject is forbidden to write an object unless the security classification of the subject is
less than or equal to that of the object.
Note that a high level of clearance to read implies a low level of clearance to write -
otherwise information flows from high to low levels. This is, in highly secure systems, not
permitted.
Mandatory security schemes are relatively easy to understand and, therefore,
relatively easy to manage and audit. Discretionary security is difficult to control and therefore
mistakes and oversights are easy to make and difficult to detect. You can translate this
difficulty into costs.
o There are perhaps two other key principles in security. One is disclosure, which is
often only on a need-to-know basis. This fits in better with discretionary security
than mandatory, as it implies neither any prior existing level nor the need for
specific assignment.
o The other principle is to divide responsibilities. The DBA responsible for security
management is him/herself a security risk. Management approaches that involve
one person or a group of people that have connections in their work represent a
similar risk. This emphasises the importance of security auditing and the importance
of related procedure design.
Statistical Database Security
Statistical databases are used mainly to produce statistics about various populations.
The database may contain confidential data about individuals, which should be protected
from user access. However, users are permitted to retrieve statistical information about the
populations, such as averages, sums, counts, maximums, minimums, and standard deviations.
The techniques that have been developed to protect the privacy of individual
information are beyond the scope of this book. We will illustrate the problem with a very
simple example, which refers to the relation shown in Figure 24.3. This is a PERSON relation
with the attributes Name, Ssn, Income, Address, City, State, Zip, Sex, and Last_degree.

Page24
A population is a set of tuples of a relation (table) that satisfy some selection
condition. Hence, each selection condition on the PERSON relation will specify a particular
population of PERSON tuples. For example, the condition Sex = ‘M’ specifies the male
population; the condition ((Sex = ‘F’) AND (Last_degree = ‘M.S.’ OR Last_degree =
‘Ph.D.’)) specifies the female population that has an M.S. or Ph.D. degree as their highest
degree; and the condition City = ‘Houston’ specifies the population that lives in Houston.
Statistical queries involve applying statistical functions to a population of tuples. For
example, we may want to retrieve the number of individuals in a population or the average
income in the population.
However, statistical users are not allowed to retrieve individual data, such as the
income of a specific person.Statistical database security techniques must prohibit the retrieval
of individual data. This can be achieved by prohibiting queries that retrieve attribute values
and by allowing only queries that involve statistical aggregate functions such
as COUNT, SUM, MIN, MAX, AVERAGE, and STANDARD DEVIATION. Such queries
are sometimes called statistical queries.
It is the responsibility of a database management system to ensure the confidentiality
of information about individuals, while still providing useful statistical summaries of data
about those individuals to users
In some cases it is possible to infer the values of individual tuples from a sequence of
statistical queries. This is particularly true when the conditions result in a population
consisting of a small number of tuples. As an illustration, consider the following statistical
queries:
Q1: SELECT COUNT (*) FROM PERSON WHERE <condition>;
Q2: SELECT AVG (Income) FROM PERSON WHERE <condition>;
Now suppose that we are interested in finding the Salary of Jane Smith, and we know
that she has a Ph.D. degree and that she lives in the city of Bellaire, Texas. We issue the
statistical query Q1 with the following condition:
(Last_degree=‘Ph.D.’ AND Sex=‘F’ AND City=‘Bellaire’ AND State=‘Texas’)
If we get a result of 1 for this query, we can issue Q2 with the same condition and find
the Salary of Jane Smith. Even if the result of Q1 on the preceding condition is not 1 but is a
small number—say 2 or 3—we can issue statistical queries using the functions MAX, MIN,
and AVERAGE to identify the possible range of values for the Salary of Jane Smith.
The possibility of inferring individual information from statistical queries is reduced
if no statistical queries are permitted whenever the number of tuples in the population
specified by the selection condition falls below some threshold.

Page25
Another technique for prohibiting retrieval of individual information is to prohibit
sequences of queries that refer repeatedly to the same population of tuples. It is also possible
to introduce slight inaccuracies or noise into the results of statistical queries deliberately, to
make it difficult to deduce individual information from the results.
Another technique is partitioning of the database. Partitioning implies that records are
stored in groups of some minimum size; queries can refer to any complete group or set of
groups, but never to subsets of records within a group.

Unit 5 rdbms study_material

Recommended

Recommended

More Related Content

Similar to Unit 5 rdbms study_material

Similar to Unit 5 rdbms study_material (20)

More from gayaramesh

More from gayaramesh (7)

Recently uploaded

Recently uploaded (20)

Unit 5 rdbms study_material