File organization

FILE – ORGANIZATION
PRESENTED BY
DR.RITU BHARGAVA
SOPHIA GIRLS’COLLEGE AJMER(AUTONOMOUS)

Definition of
File
Organization
⊙File organization means the
way data is stored so that it can
be retrieved when needed.
⊙It includes the physical order
and layout of records on
storage devices
⊙The techniques used to find
and retrieve stored records are
called access methods.
2

GOALS OF
FILE
ORGANIZATION
⊙To give ease of creation and
maintenance of database in
terms of file organization.
⊙To create an efficient way of
storing and retrieving
information from file system.
3

OVERVIEW
⊙A logical file is a complete set of records for a
specific purpose or designated to specific
application .
⊙In case of file organization, database is stored
in form of collection of files.
⊙Each file is organized logically as a sequence
of multiple records.
⊙A record is sequence of fields in a relation.
⊙Records are mapped onto disk blocks for
storage.
⊙Size of such records on file system may vary.
4

OVERVIEW
⊙One approach to mapping database
to files is to store records of one
length in a given file called as fixed
length records.
⊙An alternative approach is variable
length records
5

RECORDS IN
FILES:
FIXED LENGTH
RECORD
⊙Let us consider following example
⊙Type student=record
sname : char(20);
sid : char(4);
fees : real;
end
If each character occupies one byte,
an integer occupies 4 bytes, real
occupies 8 bytes then student
record is 32 bytes long
6

Disadvantage
⊙It is difficult to delete a record
from such fix structure.
⊙Block size should be multiple
of 32 .It would then require
two block accesses to read or
write a record which is more
than size 32.
7

VARIABLE
LENGTH
RECORDS
⊙Variable length records arise in
database systems in several ways:
1. Storage of multiple record types in
a file.
2. Record types that allow variable
lengths for one or more fields.
3. Record types that allow repeating
fields
8

VARIABLE
LENGTH
RECORDS
⊙Type student=record
class _name : char(20);
student_info : array [1..∞ ] of record;
sid : char(4);
fees : real;
end
end
⊙We define student-info as ana array with
an arbitrary number of elements ,so that
there is no limit on how large a record can
be.
9

TYPEES OF
FILE
ORGANIZATION
⊙Sequential file organization
⊙Indexed Sequential file
organization
⊙Direct or Random file
organization
10

SEQUENTIAL
FILE
ORGANIZATION
⊙In sequential file organization records
are arranged in physical sequence by
the value of some field called the
sequence field.
⊙The field chosen is the key field, one
unique values that are used to identify
records.
⊙The records are laid out on the storage
devices ,often magnetic tapes in
increasing and decreasing order by the
value of the sequence field.For ex: IBM’s
SAM(sequential access method) 11

SEQUENTIAL
FILE
ORGANIZATION
⊙It is the oldest method of file organization
⊙This organization is simple
⊙Easy to understand and easy to manage.
⊙It is best suited for sequential access
retrieving records one after the another in
the same order in which they are stored.
⊙ With this organization,insertion,updation
and deletion are done by rewriting the
entire file.
⊙Suitable for applications such as Payroll
System.
12

ADVANTAGES
&
DISADVANTAGES
⊙Simplicity
⊙Less overheads
⊙Sequential file is best use if storage space.
⊙Difficulty in Searching
⊙Lack of support
⊙Problem with record deletion for queries.
⊙Sequential file is time consuming process.
⊙It has high data redundancy.
13

INDEXED
SEQUENTIAL
ACCESS
METHOD
⊙The records in this type of file are organized in
sequence and an index table is used to speed up
⊙Access to the records without requiring a search of
the entire file.
⊙The records of the file can be stored in random
sequence but the index table is in stored sequence
on the key value.
⊙File can be both randomly as well as sequentially
accessed.
⊙Records can be updated deleted and inserted in
indexed file organization because we can limit the
amount of reorganizing we ned to perform.
⊙This technique is referred as ISAM(indexed
sequential access method.
14

ADVANTAGES
⊙In indexed sequential access file,
sequential file and random file access is
possible.
⊙It accesses the records very fast if the
index table is properly organized.
⊙The records can be inserted in the
middle of the file.
⊙It provides quick access for sequential
and direct processing.
⊙It reduces the degree of the sequential
search.
15

DISADVANTAGES
⊙Indexed sequential access file requires
unique keys and periodic reorganization.
⊙Indexed sequential access file takes
longer time to search the index for the
data access or retrieval.
⊙It requires more storage space.
⊙It is expensive because it requires
special software.
⊙It is less efficient in the use of storage
space as compared to other file
organizations.
16

DIRECT
FILE
ORGANIZATION
⊙Direct file organization is designed to provide
random access ,rapid ,direct non sequential
access to records .
⊙IBM’S BDAM(basic direct access mrthod)uses
this technique.
⊙Using this organization, records are inserted in
random order.
⊙Direct access organization provides random
access to records and is most often used with
databases.
⊙A hashing technique such as division/remainder
or splitting/folding is used to convert the value of
some field into a target address.
17

DIRECT
FILE
ORGANIZATION
⊙Collisions can be minimized by choosing a
better hashing scheme ,increasing the
bucket size so that each page holds more
records or reducing packet density.
⊙Overflow is handled by searching forward a
predetermined number of slots or using an
overflow area.
⊙Synonym pointers connect overflow
records.
18

TYPES
OF HASHING
SCHEME
⊙DIVISION METHOD
In this method, we choose a number M such
that M>N choose Prime number as M then Hash
function is defined as
H(K)= K mod N
Where N =number of records
K = set of keys
Divide K by M and take the remainder of the division
For example
If K=9875 , N=58 , M=97 then
H(K)=9875 mod 97
=78 19

TYPES
OF HASHING
SCHEME
⊙MID-SQUARE METHOD
In this method, we take square of K ie K2
we chop off digits from both the ends of K2
Final value is called L.
Hash function is defined as
H(k)=L
if K=9875,N=58, M=97 then we have
K2 = 97515625
H(K)=middle 2 digits of K2 = 15.
20

TYPES
OF HASHING
SCHEME
⊙FOLDING METHOD
Here K is partitioned into number of parts such
as K1,K2,k3…Kn. The parts are then added
together ignoring the final carry.
Hash function is defined as
H(K)=k1 + K2 + ………Kn
If K = 9875 ,N= 58, M=97 then
H(K)= 98+75=173
ignoring the carry ,we have
H(k)=73
21

ADVANTAGES
⊙Direct access file helps in online
transaction processing system (OLTP) like
online railway reservation system.
⊙In direct access file, sorting of the records
are not required.
⊙It accesses the desired records
immediately.
⊙It updates several files quickly.
⊙It has better control over record allocation.
22

DISADVANTAGES
⊙Direct access file does not
provide back up facility.
⊙It is expensive.
⊙It has less storage space as
compared to sequential file.
23

File organization

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to File organization

Similar to File organization (20)

More from RituBhargava7

More from RituBhargava7 (9)

Recently uploaded

Recently uploaded (20)

File organization