2. Definition of
File
Organization
⊙File organization means the
way data is stored so that it can
be retrieved when needed.
⊙It includes the physical order
and layout of records on
storage devices
⊙The techniques used to find
and retrieve stored records are
called access methods.
2
3. GOALS OF
FILE
ORGANIZATION
⊙To give ease of creation and
maintenance of database in
terms of file organization.
⊙To create an efficient way of
storing and retrieving
information from file system.
3
4. OVERVIEW
⊙A logical file is a complete set of records for a
specific purpose or designated to specific
application .
⊙In case of file organization, database is stored
in form of collection of files.
⊙Each file is organized logically as a sequence
of multiple records.
⊙A record is sequence of fields in a relation.
⊙Records are mapped onto disk blocks for
storage.
⊙Size of such records on file system may vary.
4
5. OVERVIEW
⊙One approach to mapping database
to files is to store records of one
length in a given file called as fixed
length records.
⊙An alternative approach is variable
length records
5
6. RECORDS IN
FILES:
FIXED LENGTH
RECORD
⊙Let us consider following example
⊙Type student=record
sname : char(20);
sid : char(4);
fees : real;
end
If each character occupies one byte,
an integer occupies 4 bytes, real
occupies 8 bytes then student
record is 32 bytes long
6
7. Disadvantage
⊙It is difficult to delete a record
from such fix structure.
⊙Block size should be multiple
of 32 .It would then require
two block accesses to read or
write a record which is more
than size 32.
7
8. VARIABLE
LENGTH
RECORDS
⊙Variable length records arise in
database systems in several ways:
1. Storage of multiple record types in
a file.
2. Record types that allow variable
lengths for one or more fields.
3. Record types that allow repeating
fields
8
9. VARIABLE
LENGTH
RECORDS
⊙Type student=record
class _name : char(20);
student_info : array [1..∞ ] of record;
sid : char(4);
fees : real;
end
end
⊙We define student-info as ana array with
an arbitrary number of elements ,so that
there is no limit on how large a record can
be.
9
11. SEQUENTIAL
FILE
ORGANIZATION
⊙In sequential file organization records
are arranged in physical sequence by
the value of some field called the
sequence field.
⊙The field chosen is the key field, one
unique values that are used to identify
records.
⊙The records are laid out on the storage
devices ,often magnetic tapes in
increasing and decreasing order by the
value of the sequence field.For ex: IBM’s
SAM(sequential access method) 11
12. SEQUENTIAL
FILE
ORGANIZATION
⊙It is the oldest method of file organization
⊙This organization is simple
⊙Easy to understand and easy to manage.
⊙It is best suited for sequential access
retrieving records one after the another in
the same order in which they are stored.
⊙ With this organization,insertion,updation
and deletion are done by rewriting the
entire file.
⊙Suitable for applications such as Payroll
System.
12
14. INDEXED
SEQUENTIAL
ACCESS
METHOD
⊙The records in this type of file are organized in
sequence and an index table is used to speed up
⊙Access to the records without requiring a search of
the entire file.
⊙The records of the file can be stored in random
sequence but the index table is in stored sequence
on the key value.
⊙File can be both randomly as well as sequentially
accessed.
⊙Records can be updated deleted and inserted in
indexed file organization because we can limit the
amount of reorganizing we ned to perform.
⊙This technique is referred as ISAM(indexed
sequential access method.
14
15. ADVANTAGES
⊙In indexed sequential access file,
sequential file and random file access is
possible.
⊙It accesses the records very fast if the
index table is properly organized.
⊙The records can be inserted in the
middle of the file.
⊙It provides quick access for sequential
and direct processing.
⊙It reduces the degree of the sequential
search.
15
16. DISADVANTAGES
⊙Indexed sequential access file requires
unique keys and periodic reorganization.
⊙Indexed sequential access file takes
longer time to search the index for the
data access or retrieval.
⊙It requires more storage space.
⊙It is expensive because it requires
special software.
⊙It is less efficient in the use of storage
space as compared to other file
organizations.
16
17. DIRECT
FILE
ORGANIZATION
⊙Direct file organization is designed to provide
random access ,rapid ,direct non sequential
access to records .
⊙IBM’S BDAM(basic direct access mrthod)uses
this technique.
⊙Using this organization, records are inserted in
random order.
⊙Direct access organization provides random
access to records and is most often used with
databases.
⊙A hashing technique such as division/remainder
or splitting/folding is used to convert the value of
some field into a target address.
17
18. DIRECT
FILE
ORGANIZATION
⊙Collisions can be minimized by choosing a
better hashing scheme ,increasing the
bucket size so that each page holds more
records or reducing packet density.
⊙Overflow is handled by searching forward a
predetermined number of slots or using an
overflow area.
⊙Synonym pointers connect overflow
records.
18
19. TYPES
OF HASHING
SCHEME
⊙DIVISION METHOD
In this method, we choose a number M such
that M>N choose Prime number as M then Hash
function is defined as
H(K)= K mod N
Where N =number of records
K = set of keys
Divide K by M and take the remainder of the division
For example
If K=9875 , N=58 , M=97 then
H(K)=9875 mod 97
=78 19
20. TYPES
OF HASHING
SCHEME
⊙MID-SQUARE METHOD
In this method, we take square of K ie K2
we chop off digits from both the ends of K2
Final value is called L.
Hash function is defined as
H(k)=L
if K=9875,N=58, M=97 then we have
K2 = 97515625
H(K)=middle 2 digits of K2 = 15.
20
21. TYPES
OF HASHING
SCHEME
⊙FOLDING METHOD
Here K is partitioned into number of parts such
as K1,K2,k3…Kn. The parts are then added
together ignoring the final carry.
Hash function is defined as
H(K)=k1 + K2 + ………Kn
If K = 9875 ,N= 58, M=97 then
H(K)= 98+75=173
ignoring the carry ,we have
H(k)=73
21
22. ADVANTAGES
⊙Direct access file helps in online
transaction processing system (OLTP) like
online railway reservation system.
⊙In direct access file, sorting of the records
are not required.
⊙It accesses the desired records
immediately.
⊙It updates several files quickly.
⊙It has better control over record allocation.
22
23. DISADVANTAGES
⊙Direct access file does not
provide back up facility.
⊙It is expensive.
⊙It has less storage space as
compared to sequential file.
23