FILE – ORGANIZATION
PRESENTED BY
DR.RITU BHARGAVA
SOPHIA GIRLS’COLLEGE AJMER(AUTONOMOUS)
Definition of
File
Organization
⊙File organization means the
way data is stored so that it can
be retrieved when needed.
⊙It includes the physical order
and layout of records on
storage devices
⊙The techniques used to find
and retrieve stored records are
called access methods.
2
GOALS OF
FILE
ORGANIZATION
⊙To give ease of creation and
maintenance of database in
terms of file organization.
⊙To create an efficient way of
storing and retrieving
information from file system.
3
OVERVIEW
⊙A logical file is a complete set of records for a
specific purpose or designated to specific
application .
⊙In case of file organization, database is stored
in form of collection of files.
⊙Each file is organized logically as a sequence
of multiple records.
⊙A record is sequence of fields in a relation.
⊙Records are mapped onto disk blocks for
storage.
⊙Size of such records on file system may vary.
4
OVERVIEW
⊙One approach to mapping database
to files is to store records of one
length in a given file called as fixed
length records.
⊙An alternative approach is variable
length records
5
RECORDS IN
FILES:
FIXED LENGTH
RECORD
⊙Let us consider following example
⊙Type student=record
sname : char(20);
sid : char(4);
fees : real;
end
If each character occupies one byte,
an integer occupies 4 bytes, real
occupies 8 bytes then student
record is 32 bytes long
6
Disadvantage
⊙It is difficult to delete a record
from such fix structure.
⊙Block size should be multiple
of 32 .It would then require
two block accesses to read or
write a record which is more
than size 32.
7
VARIABLE
LENGTH
RECORDS
⊙Variable length records arise in
database systems in several ways:
1. Storage of multiple record types in
a file.
2. Record types that allow variable
lengths for one or more fields.
3. Record types that allow repeating
fields
8
VARIABLE
LENGTH
RECORDS
⊙Type student=record
class _name : char(20);
student_info : array [1..∞ ] of record;
sid : char(4);
fees : real;
end
end
⊙We define student-info as ana array with
an arbitrary number of elements ,so that
there is no limit on how large a record can
be.
9
TYPEES OF
FILE
ORGANIZATION
⊙Sequential file organization
⊙Indexed Sequential file
organization
⊙Direct or Random file
organization
10
SEQUENTIAL
FILE
ORGANIZATION
⊙In sequential file organization records
are arranged in physical sequence by
the value of some field called the
sequence field.
⊙The field chosen is the key field, one
unique values that are used to identify
records.
⊙The records are laid out on the storage
devices ,often magnetic tapes in
increasing and decreasing order by the
value of the sequence field.For ex: IBM’s
SAM(sequential access method) 11
SEQUENTIAL
FILE
ORGANIZATION
⊙It is the oldest method of file organization
⊙This organization is simple
⊙Easy to understand and easy to manage.
⊙It is best suited for sequential access
retrieving records one after the another in
the same order in which they are stored.
⊙ With this organization,insertion,updation
and deletion are done by rewriting the
entire file.
⊙Suitable for applications such as Payroll
System.
12
ADVANTAGES
&
DISADVANTAGES
⊙Simplicity
⊙Less overheads
⊙Sequential file is best use if storage space.
⊙Difficulty in Searching
⊙Lack of support
⊙Problem with record deletion for queries.
⊙Sequential file is time consuming process.
⊙It has high data redundancy.
13
INDEXED
SEQUENTIAL
ACCESS
METHOD
⊙The records in this type of file are organized in
sequence and an index table is used to speed up
⊙Access to the records without requiring a search of
the entire file.
⊙The records of the file can be stored in random
sequence but the index table is in stored sequence
on the key value.
⊙File can be both randomly as well as sequentially
accessed.
⊙Records can be updated deleted and inserted in
indexed file organization because we can limit the
amount of reorganizing we ned to perform.
⊙This technique is referred as ISAM(indexed
sequential access method.
14
ADVANTAGES
⊙In indexed sequential access file,
sequential file and random file access is
possible.
⊙It accesses the records very fast if the
index table is properly organized.
⊙The records can be inserted in the
middle of the file.
⊙It provides quick access for sequential
and direct processing.
⊙It reduces the degree of the sequential
search.
15
DISADVANTAGES
⊙Indexed sequential access file requires
unique keys and periodic reorganization.
⊙Indexed sequential access file takes
longer time to search the index for the
data access or retrieval.
⊙It requires more storage space.
⊙It is expensive because it requires
special software.
⊙It is less efficient in the use of storage
space as compared to other file
organizations.
16
DIRECT
FILE
ORGANIZATION
⊙Direct file organization is designed to provide
random access ,rapid ,direct non sequential
access to records .
⊙IBM’S BDAM(basic direct access mrthod)uses
this technique.
⊙Using this organization, records are inserted in
random order.
⊙Direct access organization provides random
access to records and is most often used with
databases.
⊙A hashing technique such as division/remainder
or splitting/folding is used to convert the value of
some field into a target address.
17
DIRECT
FILE
ORGANIZATION
⊙Collisions can be minimized by choosing a
better hashing scheme ,increasing the
bucket size so that each page holds more
records or reducing packet density.
⊙Overflow is handled by searching forward a
predetermined number of slots or using an
overflow area.
⊙Synonym pointers connect overflow
records.
18
TYPES
OF HASHING
SCHEME
⊙DIVISION METHOD
In this method, we choose a number M such
that M>N choose Prime number as M then Hash
function is defined as
H(K)= K mod N
Where N =number of records
K = set of keys
Divide K by M and take the remainder of the division
For example
If K=9875 , N=58 , M=97 then
H(K)=9875 mod 97
=78 19
TYPES
OF HASHING
SCHEME
⊙MID-SQUARE METHOD
In this method, we take square of K ie K2
we chop off digits from both the ends of K2
Final value is called L.
Hash function is defined as
H(k)=L
if K=9875,N=58, M=97 then we have
K2 = 97515625
H(K)=middle 2 digits of K2 = 15.
20
TYPES
OF HASHING
SCHEME
⊙FOLDING METHOD
Here K is partitioned into number of parts such
as K1,K2,k3…Kn. The parts are then added
together ignoring the final carry.
Hash function is defined as
H(K)=k1 + K2 + ………Kn
If K = 9875 ,N= 58, M=97 then
H(K)= 98+75=173
ignoring the carry ,we have
H(k)=73
21
ADVANTAGES
⊙Direct access file helps in online
transaction processing system (OLTP) like
online railway reservation system.
⊙In direct access file, sorting of the records
are not required.
⊙It accesses the desired records
immediately.
⊙It updates several files quickly.
⊙It has better control over record allocation.
22
DISADVANTAGES
⊙Direct access file does not
provide back up facility.
⊙It is expensive.
⊙It has less storage space as
compared to sequential file.
23
THANK YOU

File organization

  • 1.
    FILE – ORGANIZATION PRESENTEDBY DR.RITU BHARGAVA SOPHIA GIRLS’COLLEGE AJMER(AUTONOMOUS)
  • 2.
    Definition of File Organization ⊙File organizationmeans the way data is stored so that it can be retrieved when needed. ⊙It includes the physical order and layout of records on storage devices ⊙The techniques used to find and retrieve stored records are called access methods. 2
  • 3.
    GOALS OF FILE ORGANIZATION ⊙To giveease of creation and maintenance of database in terms of file organization. ⊙To create an efficient way of storing and retrieving information from file system. 3
  • 4.
    OVERVIEW ⊙A logical fileis a complete set of records for a specific purpose or designated to specific application . ⊙In case of file organization, database is stored in form of collection of files. ⊙Each file is organized logically as a sequence of multiple records. ⊙A record is sequence of fields in a relation. ⊙Records are mapped onto disk blocks for storage. ⊙Size of such records on file system may vary. 4
  • 5.
    OVERVIEW ⊙One approach tomapping database to files is to store records of one length in a given file called as fixed length records. ⊙An alternative approach is variable length records 5
  • 6.
    RECORDS IN FILES: FIXED LENGTH RECORD ⊙Letus consider following example ⊙Type student=record sname : char(20); sid : char(4); fees : real; end If each character occupies one byte, an integer occupies 4 bytes, real occupies 8 bytes then student record is 32 bytes long 6
  • 7.
    Disadvantage ⊙It is difficultto delete a record from such fix structure. ⊙Block size should be multiple of 32 .It would then require two block accesses to read or write a record which is more than size 32. 7
  • 8.
    VARIABLE LENGTH RECORDS ⊙Variable length recordsarise in database systems in several ways: 1. Storage of multiple record types in a file. 2. Record types that allow variable lengths for one or more fields. 3. Record types that allow repeating fields 8
  • 9.
    VARIABLE LENGTH RECORDS ⊙Type student=record class _name: char(20); student_info : array [1..∞ ] of record; sid : char(4); fees : real; end end ⊙We define student-info as ana array with an arbitrary number of elements ,so that there is no limit on how large a record can be. 9
  • 10.
    TYPEES OF FILE ORGANIZATION ⊙Sequential fileorganization ⊙Indexed Sequential file organization ⊙Direct or Random file organization 10
  • 11.
    SEQUENTIAL FILE ORGANIZATION ⊙In sequential fileorganization records are arranged in physical sequence by the value of some field called the sequence field. ⊙The field chosen is the key field, one unique values that are used to identify records. ⊙The records are laid out on the storage devices ,often magnetic tapes in increasing and decreasing order by the value of the sequence field.For ex: IBM’s SAM(sequential access method) 11
  • 12.
    SEQUENTIAL FILE ORGANIZATION ⊙It is theoldest method of file organization ⊙This organization is simple ⊙Easy to understand and easy to manage. ⊙It is best suited for sequential access retrieving records one after the another in the same order in which they are stored. ⊙ With this organization,insertion,updation and deletion are done by rewriting the entire file. ⊙Suitable for applications such as Payroll System. 12
  • 13.
    ADVANTAGES & DISADVANTAGES ⊙Simplicity ⊙Less overheads ⊙Sequential fileis best use if storage space. ⊙Difficulty in Searching ⊙Lack of support ⊙Problem with record deletion for queries. ⊙Sequential file is time consuming process. ⊙It has high data redundancy. 13
  • 14.
    INDEXED SEQUENTIAL ACCESS METHOD ⊙The records inthis type of file are organized in sequence and an index table is used to speed up ⊙Access to the records without requiring a search of the entire file. ⊙The records of the file can be stored in random sequence but the index table is in stored sequence on the key value. ⊙File can be both randomly as well as sequentially accessed. ⊙Records can be updated deleted and inserted in indexed file organization because we can limit the amount of reorganizing we ned to perform. ⊙This technique is referred as ISAM(indexed sequential access method. 14
  • 15.
    ADVANTAGES ⊙In indexed sequentialaccess file, sequential file and random file access is possible. ⊙It accesses the records very fast if the index table is properly organized. ⊙The records can be inserted in the middle of the file. ⊙It provides quick access for sequential and direct processing. ⊙It reduces the degree of the sequential search. 15
  • 16.
    DISADVANTAGES ⊙Indexed sequential accessfile requires unique keys and periodic reorganization. ⊙Indexed sequential access file takes longer time to search the index for the data access or retrieval. ⊙It requires more storage space. ⊙It is expensive because it requires special software. ⊙It is less efficient in the use of storage space as compared to other file organizations. 16
  • 17.
    DIRECT FILE ORGANIZATION ⊙Direct file organizationis designed to provide random access ,rapid ,direct non sequential access to records . ⊙IBM’S BDAM(basic direct access mrthod)uses this technique. ⊙Using this organization, records are inserted in random order. ⊙Direct access organization provides random access to records and is most often used with databases. ⊙A hashing technique such as division/remainder or splitting/folding is used to convert the value of some field into a target address. 17
  • 18.
    DIRECT FILE ORGANIZATION ⊙Collisions can beminimized by choosing a better hashing scheme ,increasing the bucket size so that each page holds more records or reducing packet density. ⊙Overflow is handled by searching forward a predetermined number of slots or using an overflow area. ⊙Synonym pointers connect overflow records. 18
  • 19.
    TYPES OF HASHING SCHEME ⊙DIVISION METHOD Inthis method, we choose a number M such that M>N choose Prime number as M then Hash function is defined as H(K)= K mod N Where N =number of records K = set of keys Divide K by M and take the remainder of the division For example If K=9875 , N=58 , M=97 then H(K)=9875 mod 97 =78 19
  • 20.
    TYPES OF HASHING SCHEME ⊙MID-SQUARE METHOD Inthis method, we take square of K ie K2 we chop off digits from both the ends of K2 Final value is called L. Hash function is defined as H(k)=L if K=9875,N=58, M=97 then we have K2 = 97515625 H(K)=middle 2 digits of K2 = 15. 20
  • 21.
    TYPES OF HASHING SCHEME ⊙FOLDING METHOD HereK is partitioned into number of parts such as K1,K2,k3…Kn. The parts are then added together ignoring the final carry. Hash function is defined as H(K)=k1 + K2 + ………Kn If K = 9875 ,N= 58, M=97 then H(K)= 98+75=173 ignoring the carry ,we have H(k)=73 21
  • 22.
    ADVANTAGES ⊙Direct access filehelps in online transaction processing system (OLTP) like online railway reservation system. ⊙In direct access file, sorting of the records are not required. ⊙It accesses the desired records immediately. ⊙It updates several files quickly. ⊙It has better control over record allocation. 22
  • 23.
    DISADVANTAGES ⊙Direct access filedoes not provide back up facility. ⊙It is expensive. ⊙It has less storage space as compared to sequential file. 23
  • 24.