File Organization

Data Storage and Basic File
Structure
Ms. Amrit Kaur
4/29/2021 1:05 PM

• Databases consist of large amount of data that
are stored permanently on magnetic disk.
• Database applications need only a small
portion of database at a time for processing.
– Data from the disk is copied to main memory for
processing and rewritten to the disk if the data is
changed.
4/29/2021 1:05 PM

Data Files
• The data on the disk is physically stored as
files of records.
• A data file is a sequence of records
4/29/2021 1:05 PM

Records and Record Types
• A record is a collection of related data values
or items that corresponds to a particular field.
– Record describes a particular entity, their
attributes, and their relationships.
• Types of Records
– Fixed length records
– Variable length records
4/29/2021 1:05 PM

• Fixed length record
– When ALL record in a file has exactly the same size in
bytes
– Every record has same fields and field lengths are
fixed.
– Example:
• CREATE TABLE student
(rno char(3),
name char(15),
city char (15));
1 char occupies 1 bytes
Total Record Size = 3 + 15+ 15 = 33 bytes
4/29/2021 1:05 PM
1.. Amrit.......... Delhi………. 33
2.. Dj…………. Chennai…….. 33
12. Jaspreet……. Goa………… 33
123 Jasmeet…….. Delhi………. 33
3 bytes 15 bytes
15 bytes

• Variable length record
– When different records in the file have different
sizes
– Example:
• CREATE TABLE student
(rno varchar(3),
name varchar(15),
city varchar (15));
4/29/2021 1:05 PM
1 Amrit Delhi 11
2 Dj Chennai 10
12 Jaspreet Goa 13
123 Jasmeet Delhi 15

Record and Record Types
• Reasons of having variable length records
– Record types
• that allow variable length for one or more fields.
• One or more fields are optional
– File having records of different record types
– One or more fields have multiple values for
individual records
4/29/2021 1:05 PM

FILE ORGANIZATION
4/29/2021 1:05 PM

What is File Organization?
• A file organization simply means organization
of records in files.
• A file organization is defined as a technique to
determine
– how the file records are physically arranged on the
disk and
– how the records can be accessed
4/29/2021 1:05 PM

Need of File Organization
• Fast data retrieval
• Efficient use of storage space
• Protection from failure or data loss
• Minimizing need for reorganization
• Security from unauthorized user
4/29/2021 1:05 PM

Types of File Organization
• Heap File Organization
• Sequential File Organization
• Indexed File Organization
• Hashing File Organization
4/29/2021 1:05 PM

Heap File Organization
4/29/2021 1:05 PM
• Records (data) is stored in the file in the order in
which they are inserted
217 Sita Delhi
101 Ramesh Chennai
215 Gita Chennai
102 Mina Mumbai
201 Suresh Delhi
218 Mina Chennai
222 Ram Chennai
305 Robin Mumbai
220 Amrit Delhi
Student (RollNumber, Name, City)

Heap File Organization
• Also called pile file or Non Sequential
Organization .
• Operations
– Insertion at the end of the file, so very efficient
– Retrieval in order of the values of field requires external sorting.
– Searching involves Linear search through a file, so searching is
slow
– Deletion leaves unused space and requires periodic
reorganization…time conmunsimg and not effective
4/29/2021 1:05 PM

Sequential Data File
4/29/2021 1:05 PM
• A records(data) in the file are stored in sequence
according to the value of search key and / or primary
key of each record.
101 Ramesh Chennai
201 Suresh Delhi
210 Joy Mumabi
215 Gita Chennai
217 Sita Delhi
218 Mina Chennai
222 Ram Chennai
305 Robin Mumbai
Student (RollNumber, Name, City)

Sequential File Organization
• Operations
– Retrieval is efficient because no sorting is required
– Searching involves Binary search through a file, so
moderate speed
– Insertion and deletion are expensive and time
consuming because requires reordering and
rewriting
4/29/2021 1:05 PM

Indexed File Organization
• Two files
– Data File: table data (.myd)
– Index File: index of data (.myi)
4/29/2021 1:05 PM

What it is?
• In data file, records are stored either
sequentially or non sequentially and
• Index File is created that allow application to
locate individual records.
4/29/2021 1:05 PM

What is Index?
• An index is a table used to determine the location of
records in a file.
• Index speed up the retrieval of records w.r.t. search
conditions.
• Any field (column) of the file can be used to create an
index and known as index field.
• Multiple indexes on different fields can be constructed
4/29/2021 1:05 PM

…. Contd…
• Types of Index
– Ordered indices
• Index file is sorted in order of index field
– Hash indices
• Based on uniform distribution of values determined by
function called hash function.
4/29/2021 1:05 PM

Indexing Methods Based on Ordering
• Primary Index
• Clustering Index
• Secondary Index
• Dense Index
• Sparse Index
4/29/2021 1:05 PM

How Index are stored?
• Ordered File with two fields (Key, Pointer)
– First Field (Key) : value of field used for indexing
– Second Field: A block or record pointer
4/29/2021 1:05 PM

Primary Index
• When the ordering of a file is on field which
have a unique value of each record, the index
is known as primary index.
• Primary Index can be characterized as
– Dense
– Sparse
4/29/2021 1:05 PM

Clustering Index
• When the ordering of a file is on field which does
not have a distinct value of each record, the index
is known as clustered index.
• It is also a non dense index.
• When you create a table with a primary key or
unique key, automatically creates a special index
named PRIMARY. This index is called the clustered
index.
4/29/2021 1:05 PM

Secondary Index
• May be on the field which is a candidate key
or a non key with duplicate values
• There can be many secondary indexes for the
same file.
• It is a dense index.
4/29/2021 1:05 PM

Primary Index ….contd…
• A DENSE INDEX has an index entry for every
search key value (every record)
4/29/2021 1:05 PM

Primary Index ….contd…
• A SPARSE INDEX (nondense) has entries for
only some of the search values.
4/29/2021 1:05 PM

Problems with simple ordered indexes
that are kept in disk
• Searching the index is still not fast (binary
searching):
– We do not want more than 3 to 4 comparisons
for a search
• Insertions and deletions of index is expensive
– Index file is sorted
4/29/2021 1:05 PM

SOLUTION
• Multilevel Indexing
4/29/2021 1:05 PM

Multilevel Indexing
• Creating an index of an index file is called
multilevel indexing.
• How?
– Build a simple index for the file, as a sorted file with a
distinct value for each key (First or Base Level)
– Build an primary index for this index
– Build another index for the previous index
– Continue the index-building process until we get
single block called the top index level
4/29/2021 1:05 PM

… contd…
• Multilevel indexing is implemented using a
variation of the B tree data structure, called a
B+ tree
4/29/2021 1:05 PM

Example B+Tree
4/29/2021 1:05 PM

Hashed File Organization
4/29/2021 1:05 PM

What it is?
• In a hashed file organization, address of each
record is determined using hashing algorithm.
• Provide a function h, called a hash function,
which is applied to the hash field value (key)
of a record and computes the address of the
disk block (BUCKET)in which the record is
stored.
4/29/2021 1:05 PM

Types of Hashing
• Static Hashing
• Dynamic Hashing
4/29/2021 1:05 PM

Static Hashing
• Uses hash functions in which the set of bucket
address is fixed.
• Hashing Function
– Division Method
– Mid Square Method
– Folding Method etc
4/29/2021 1:05 PM

Collision Resolution
• A collision occurs when the hash field value of
a new record that is being inserted hashes to
an address that already contains a different
record.
• The process of finding another position is
called collision resolution.
4/29/2021 1:05 PM

How Hashing is done?
4/29/2021 1:05 PM

Dynamic Hashing
• Some hashing techniques allow the hash
function to be modified dynamically to
accommodate the growth or shrinkage of the
database.
4/29/2021 1:05 PM

Extendable Hashing
• We choose a hash function that is uniform and
random. It generates values over a relatively
large range.
• The hash addresses in the address space (i.e.
the range) are represented by d-bit binary
integers (typically d = 32). As a result, we can
have a maximum of 232 (over 4 billion)
buckets.
4/29/2021 1:05 PM

• Buckets are not created buckets at once.
• Create them on demand, depending on the size
of the file.
• According to the actual number of buckets
created, we use the corresponding number of
bits to represent their address.
• For example, if there are four buckets at the if
there are four buckets at the moment, we just
need 2 bits for the addresses (i.e. 00, 01, 10 and
11).
4/29/2021 1:05 PM

File Organization

More Related Content

What's hot

Similar to File Organization

More from Amrit Kaur

Recently uploaded

File Organization