2. Introduction
• Data has to be arranged in a proper way to facilitate
operations like - accept, process and search.
• File : Stores all information relating to a particular
activity.
• File is a collection of records related to each other.
The file size is limited by the size of memory and
storage medium.
• A collection of related data items form a record.
• A set of logically related records form a file.
• Eg. Student File (rollno, name, class )
Empolyee (Employee id, name, designation, salary,
department)
• File Organization is a method of arranging the records
in a file.
3. Logical and Physical Files
• File can be viewed as Logical file or Physical files.
• Physical file :
– It is a file, viewed in terms of how the data is stored on a storage
device and how the processing is done.
– Physical files are stored on secondary storage.
– Physical file can’t be deleted without deletion of Logical file.
– Existence is independent of Logical File.
• Logical files :
– It can be viewed in terms of what data items the record contains
and what operations can be performed on the file.
– Does not occupy any memory space as it does not contain any
data.
– Logical file can be deleted without deletion of Physical file.
– Can’t exist without Physical File.
• OS makes connection between physical and logical files for
application program.
4. Basic File Operations
1. Create File :
2. Open File :
i. Read File :
ii. Write File :
iii. Seek : The action of moving a file pointer directly
to a certain position is called as seeking.
Eg. In C fseek() is used.
3. Close File :
4. Rename File :
5. Delete File :
5. File Organization
• Data is usually stored in the form of records.
• Record is collection of related data items or
values also called as fields or attributes.
• Specific data type and size is associated with the
field, which specifies range of values a field can
take.
• Standard data types are generally the basic data
types supported by programming languages such
as numeric (integer, real or float numbers) string
of characters, Boolean for storing true or false
value.
6. • Record Types :
– Fixed Length Records : If every record in a file has
exactly same size then the records are referred as
fixed- length records.
– Variable Length Records : If the record size varies
from record to record then the records are referred as
variable- length records.
• Sometimes a field may have multiple subfields for
individual records such fields are called repeating
field and group of subfields for the field is called a
repeating group.
Eg. Address field contains
flat_no, area, city, state, Country and pin.
7. Types of File Organization
• File organization ensures that records are available for
processing.
1. Sequential Access File Organization:
– All records are stored in a sequential order.
– The records can be arranged in the ascending or
descending order of a key field (Sorted file).
– Sequential file search starts from the beginning of the
file and the records can be added at the end of the file.
– In sequential file, it is not possible to add a record in the
middle of the file without rewriting the file.
– Uses read next operation.
– Files stored on magnetic tapes support sequential
access.
8. • Advantages of sequential file
– It is simple to program and easy to design.
– Searching on key field is faster.
• Disadvantages of sequential file
– It is time consuming process.
– It has high data redundancy.
– Random searching is not possible.
– Insertion and deletion of records is difficult.
9. 2. Hashed or Direct Access File Organization:
– Direct access file is also known as random access or
relative file organization.
– In direct access file, all records are stored in direct
access storage device (DASD), such as hard disk.
– The records are randomly placed throughout the file.
– The records does not need to be in sequence because
they are updated directly and rewritten back in the same
location.
– This file organization is useful for immediate access to
large amount of information. It is used in accessing large
databases.
– Uses a hash function.
– Read n operation is supported.
10. • Advantages :
– Direct access file helps in online transaction processing
system (OLTP) like online railway reservation system.
– In direct access file, sorting of the records are not
required.
– It accesses the desired records immediately.
– It updates several files quickly.
– It has better control over record allocation.
• Disadvantages :
– It is expensive.
– Difficult to design and implement.
11. 3. Indexed sequential access file organization :
– Indexed sequential access file combines both
sequential file and direct access file organization.
– In this, records are stored randomly on a direct
access device such as magnetic disk by a primary key.
– This file have multiple keys. These keys can be
alphanumeric. The key on which the records are
ordered is called primary key.
– The data can be accessed either sequentially or
randomly using the index. The index is stored in a file
and read into memory when the file is opened.
– Operations supported are read next and read n
12. • Advantages :
– Supports both sequential and direct access.
– It accesses the records very fast.
– The records can be inserted in the middle of the file.
• Disadvantages :
– Indexed sequential access file requires unique keys.
– It takes longer time for the data access or retrieval.
– It requires more storage space.
– It is expensive because it requires special software.
– It is less efficient in the use of storage space as
compared to other file organizations.
13. Indexing
• An index on a file is an auxiliary structure designed to
speed up operations that are not efficiently supported
by the basic organization of records in that file.
• Indexing in database systems is similar to what we see
in books.
• Index is usually defined on a single field of a file called
indexing field.
• Index typically stores value of an index field along with
a list of pointers to all disk blocks that contain records
with that field value.
• The index values are ordered (sorted) so we can
perform binary search on the index.
14. • Indexing is defined based on its indexing
attributes.
• Indexing can be of the following types −
• Primary Index −
– In the primary index, there are two tables, first is Index
table and another is the main database table
– Primary index is defined on an ordered data file. The
data file is ordered on a key field. The key field is
generally the primary key of the relation.
• Primary Index is of two types −
– Dense Index
– Sparse Index
15. • Dense Index
– In dense index, there is an index record for every
search key value in the database.
– This makes searching faster but requires more space
to store index records itself, so expensive in terms of
memory requirement with large database.
– Index records contain search key value and a pointer
to the actual record on the disk.
16.
17. • Sparse Index
– When there are large database tables and if we use the
dense index, then size of index increases, so the solution
to this problem is sparse index.
– In sparse index, index records are not created for every
search key but only for some search key values.
– An index record here contains a search key and an actual
pointer to the data on the disk.
– To locate a record, we find the index record with the
largest search key value less than or equal to the search
key value we are looking for.
– If the data we are looking for is not present at location
(where we directly reach by following the index), then
the system starts sequential search until the desired
data is found.
18.
19. • Secondary Index − Secondary index may be
generated from a field which is a candidate key
and has a unique value in every record, or a non-
key with duplicate values.
• Create additional indexes on data stored in a
database.
• The main purpose of secondary indexing is to
improve the performance of queries and to
simplify the search for specific records within a
database.
• Used when data set is very large.
21. • Clustered Index − Clustering index is defined on an
ordered data file. The data file is ordered on a non-
key field.
• Rows in the table are stored on disk in the same
order as the clustered index key
22. • Advantages of indexing :
– Better performance of queries.
– Fast searching from the database.
– Fast retrieval of data.
– Increase performance in SELECT query.
• Disadvantages of indexing :
– Indexing takes more space.
– Decrease performance in INSERT, DELETE and
UPDATE query.