normalization process in relational data base management

STORAGE OF DATABASE ON HARD DISKS
Databases are stored in file formats, which contain records.At physical level, the
actual data is stored in electromagnetic format on some device.
Primary Storage −
The memory storage that is directly accessible to the CPU comes under this
category. CPU's internal memory (registers), fast memory (cache), and main
memory (RAM) are directly accessible to the CPU, as they are all placed on the
motherboard or CPU chipset.
Secondary Storage −
devices are used to store data for future use or as backup.
includes memory devices that are not a part of the CPU chipset or motherboard,
magneticTapes & disks, optical disks (DVD, CD, etc.), hard disks, flash drives.
Tertiary Storage −
Tertiary storage is used to store huge volumes of data.
external to the computer system, they are the slowest in speed.
These storage devices are mostly used to take the back up of an entire system.
Disks archive and magnetic tapes are widely used as tertiary storage.

File Organization
A database is a collection of interrelated records.
The File is a collection of records. Using the primary key, we can access the
records.
File organization is a logical relationship among various records. File
organization is used to describe the way in which the records are stored in
terms of blocks, and the blocks are placed on the storage medium.
Storing the files in certain order is called file organization.
The main objective of file organization is
o Optimal selection of records i.e.; records should be accessed as fast as possible.
o Any insert, update or delete transaction on records should be easy, quick and
should not harm other records.
o No duplicate records should be induced as a result of insert, update or delete
o Records should be stored efficiently.

File Operations
Operations on database files can be broadly classified into two categories −
Update Operations: change the data values by insertion, deletion, or update.
Retrieval Operations: do not alter the data but retrieve them after optional
conditional filtering.
Open − A file can be opened in one of the two modes, read mode or write
mode. Files opened in write mode can be read but cannot be shared.
Locate − Every file has a file pointer, which tells the current position where
the data is to be read or written.
Read − By default, when files are opened in read mode, the file pointer points
to the beginning of the file.
Write − User can select to open a file in write mode, which enables them to
edit its contents.
Close When a request to close a file is generated, the operating system
removes all the locks (if in shared mode),saves the data (if altered) to the
secondary storage media, and releases all the buffers and file handlers
associated with the file.

FILE ORGANIZATIONTYPES
File organization is used to describe the way in which the records are stored in terms of
blocks, and the blocks are placed on the storage medium (disk).
HEAP FILE ORGANIZATION-(UN ORDERED FILES)
It is the simplest and most basic type of organization. It works with data blocks. In heap
file organization, the records are inserted at the file's end.When the records are
inserted, it doesn't require the sorting and ordering of records.
Advantages
very good method of file organization for bulk insertion.
In case of a small database, fetching and retrieving of records is faster than the
sequential record.
Disadvantages
inefficient for the large database because it takes time to search or modify the record.
Deletion can result in unused space/need for reorganization.

SEQUENTIAL FILE ORGANIZATION
This method is the easiest method for file organization. In this method, files are stored
sequentially.Two ways
1.Pile file method
 quite simple method. we store the record in a sequence, i.e., one after another. Here, the
record will be inserted in the order in which they are inserted into tables.
 In case of updating or deleting of any record, the record will be searched in the memory
blocks.When it is found, then it will be marked for deleting, and the new record is
inserted.
2. Sorted File Method:
 the new record is always inserted at the file's end, and then it will sort the sequence in
ascending or descending order.
 Sorting of records is based on any primary key or any other key.
 In the case of modification of any record, it will update the record and then sort the file,
and lastly, the updated record is placed in the right place.

Advantages of sequential file organization
 fast and efficient method for the huge amount of data.
 files can be easily stored in cheaper storage mechanism magnetic tapes.
 It is simple in design. no much efforts required to store the data.
Disadvantages of sequential file organization
 It will waste time as we cannot jump on a particular record that is
required but we have to move sequentially which takes our time.
 Sorted file method takes more time and space for sorting the records.

INDEXED SEQUENTIAL ACCESS METHOD (ISAM)
Advanced sequential file organization. Records are stored in the file using the primary
key.An index value is generated for each primary key and mapped with the record.
This index contains the address of the record in the file.
Pros of ISAM:
 Each record has the address of its data block, searching a record in a huge database is
quick and easy.
 Supports range retrieval and partial retrieval of records.
Cons of ISAM
 requires extra space in the disk to store the index value.
 When the new records are inserted, then these files have to be reconstructed to
maintain the sequence.
 When the record is deleted, then the space used by it needs to be released. Otherwise,
the performance of the database will slow down.

Hash File Organization or Direct file organization or Random or Relative
Hash function is used to calculate the address of the block to store the records. The hash
function can be any simple or complex mathematical function.
The hash function is applied on some columns/attributes – either key or non-key
columns to get the block address.
each record is stored randomly irrespective of the order they come.
If the hash function is generated on key column, then that column is called hash key,
and if hash function is generated on non-key column, then the column is hash column
When a record has to be retrieved, based on the hash key column, the address is
generated and directly from that address whole record is retrieved.
Similarly when a new record has to be inserted, the address is generated by hash key and
record is directly inserted. Same is the case with update and delete.There is no effort for
searching the entire file or sorting the files.

Advantages of Hash File Organization
Records need not be sorted after any of the transaction. the effort of sorting is
reduced in this method.
Since block address is known by hash function, accessing any record is very
faster. Similarly updating or deleting a record is also very quick.
This method can handle multiple transactions as each record is independent of
other. i.e.; since there is no dependency on storage location for each record,
multiple records can be accessed at the same time.
It is suitable for online transaction systems like online banking, ticket booking
system etc.
Disadvantages of Hash File Organization
This method may accidentally delete the data. In such case, older record will
be overwritten by newer. So there will be data loss.Thus hash columns needs
to be selected with utmost care.
Since all the records are randomly stored, they are scattered in the memory.
Hence memory is not efficiently used.
System design is complex and costly.
File updating is more difficult as compared to sequential files.

Types of Indexes
 Indexing is a way to optimize performance of a database by minimizing the
number of disk accesses required when a query is processed.
 An index or database index is a data structure which is used to quickly locate and
access the data in a database table.
 Indexes are created using some database columns.
o The first column is the Search key that contains a copy of the primary key
or candidate key of the table.These values are stored in sorted order so that the
corresponding data can be accessed quickly (Note that the data may or may not be stored
in sorted order).
o The second column is the Data Reference which contains a set of pointers holding the
address of the disk block where that particular key value can be found.

PRIMARY INDEXING
 Primary Index is an ordered file which is fixed length size with two fields.
 In this case, the data is sorted according to the search key. It induces sequential file
organisation
 The first field is the same a primary key and second, filed is pointed to that specific
data block.
 In the primary Index, there is always one to one relationship between the entries in
the index table.
The primary Indexing is also further divided into two types.
 Dense Index
 Sparse Index

Dense Index
 In dense index, there is an index record for every search key value in the database.
 This makes searching faster but requires more space to store index records itself.
 Index records contain search key value and a pointer to the actual record on the disk.
Sparse index
 In sparse index, index records are not created for every search key.
 An index record here contains a search key and an actual pointer to the data on the
disk.
 To search a record, we first proceed by index record and reach at the actual location of
the data.
 If the data we are looking for is not where we directly reach by following the index,
then the system starts sequential search until the desired data is found.

SECONDARY INDEX
 It is used to optimize query processing and access records in a database with
 some information other than the usual search key (primary key).
 In this two levels of indexing are used in order to reduce the mapping size of the first
level and in general.
 Initially, for the first level, a large range of numbers is selected so that the mapping
size is small. Further, each range is divided into further sub ranges.
 In order for quick memory access, first level is stored in the primary memory.Actual
physical location of the data is determined by the second mapping level.
In a bank account database, data is stored sequentially by acc_no; you may want to find
all accounts in of a specific branch of ABC bank.

Clustering Index
In some cases, the index is created on non-primary key columns which may not be
unique for each record.
 In such cases, in order to identify the records faster, we will group two or more
columns together to get the unique values and create index out of them.This method is
known as clustering index.
 Basically, records with similar characteristics are grouped together and indexes are
created for these groups.
For example, students studying in each semester are grouped together. i.e.; 1st Semester
students, 2nd semester students, 3rd semester students etc are grouped

Multi-level Index
 Multi-level Index helps in breaking down the index into several
smaller indices in order to make the outermost level so small that it can
be saved in a single disk block, which can easily be accommodated anywhere
in the main memory.
 Multilevel index is stored on the disk along with the actual database files.
 As the size of the database grows, so does the size of the indices.

normalization process in relational data base management

More Related Content

Similar to normalization process in relational data base management

Recently uploaded

normalization process in relational data base management