Data Storage and Basic File
Structure
Ms. Amrit Kaur
4/29/2021 1:05 PM
• Databases consist of large amount of data that
are stored permanently on magnetic disk.
• Database applications need only a small
portion of database at a time for processing.
– Data from the disk is copied to main memory for
processing and rewritten to the disk if the data is
changed.
4/29/2021 1:05 PM
Data Files
• The data on the disk is physically stored as
files of records.
• A data file is a sequence of records
4/29/2021 1:05 PM
Records and Record Types
• A record is a collection of related data values
or items that corresponds to a particular field.
– Record describes a particular entity, their
attributes, and their relationships.
• Types of Records
– Fixed length records
– Variable length records
4/29/2021 1:05 PM
Records and Record Types
• Fixed length record
– When ALL record in a file has exactly the same size in
bytes
– Every record has same fields and field lengths are
fixed.
– Example:
• CREATE TABLE student
(rno char(3),
name char(15),
city char (15));
1 char occupies 1 bytes
Total Record Size = 3 + 15+ 15 = 33 bytes
4/29/2021 1:05 PM
1.. Amrit.......... Delhi………. 33
2.. Dj…………. Chennai…….. 33
12. Jaspreet……. Goa………… 33
123 Jasmeet…….. Delhi………. 33
3 bytes 15 bytes
15 bytes
Records and Record Types
• Variable length record
– When different records in the file have different
sizes
– Example:
• CREATE TABLE student
(rno varchar(3),
name varchar(15),
city varchar (15));
4/29/2021 1:05 PM
1 Amrit Delhi 11
2 Dj Chennai 10
12 Jaspreet Goa 13
123 Jasmeet Delhi 15
Record and Record Types
• Reasons of having variable length records
– Record types
• that allow variable length for one or more fields.
• One or more fields are optional
– File having records of different record types
– One or more fields have multiple values for
individual records
4/29/2021 1:05 PM
FILE ORGANIZATION
4/29/2021 1:05 PM
What is File Organization?
• A file organization simply means organization
of records in files.
• A file organization is defined as a technique to
determine
– how the file records are physically arranged on the
disk and
– how the records can be accessed
4/29/2021 1:05 PM
Need of File Organization
• Fast data retrieval
• Efficient use of storage space
• Protection from failure or data loss
• Minimizing need for reorganization
• Security from unauthorized user
4/29/2021 1:05 PM
Types of File Organization
• Heap File Organization
• Sequential File Organization
• Indexed File Organization
• Hashing File Organization
4/29/2021 1:05 PM
Heap File Organization
4/29/2021 1:05 PM
• Records (data) is stored in the file in the order in
which they are inserted
217 Sita Delhi
101 Ramesh Chennai
215 Gita Chennai
102 Mina Mumbai
201 Suresh Delhi
218 Mina Chennai
222 Ram Chennai
305 Robin Mumbai
220 Amrit Delhi
Student (RollNumber, Name, City)
Heap File Organization
• Also called pile file or Non Sequential
Organization .
• Operations
– Insertion at the end of the file, so very efficient
– Retrieval in order of the values of field requires external sorting.
– Searching involves Linear search through a file, so searching is
slow
– Deletion leaves unused space and requires periodic
reorganization…time conmunsimg and not effective
4/29/2021 1:05 PM
Sequential Data File
4/29/2021 1:05 PM
• A records(data) in the file are stored in sequence
according to the value of search key and / or primary
key of each record.
101 Ramesh Chennai
201 Suresh Delhi
210 Joy Mumabi
215 Gita Chennai
217 Sita Delhi
218 Mina Chennai
222 Ram Chennai
305 Robin Mumbai
Student (RollNumber, Name, City)
Sequential File Organization
• Operations
– Retrieval is efficient because no sorting is required
– Searching involves Binary search through a file, so
moderate speed
– Insertion and deletion are expensive and time
consuming because requires reordering and
rewriting
4/29/2021 1:05 PM
Indexed File Organization
• Two files
– Data File: table data (.myd)
– Index File: index of data (.myi)
4/29/2021 1:05 PM
What it is?
• In data file, records are stored either
sequentially or non sequentially and
• Index File is created that allow application to
locate individual records.
4/29/2021 1:05 PM
What is Index?
• An index is a table used to determine the location of
records in a file.
• Index speed up the retrieval of records w.r.t. search
conditions.
• Any field (column) of the file can be used to create an
index and known as index field.
• Multiple indexes on different fields can be constructed
4/29/2021 1:05 PM
…. Contd…
• Types of Index
– Ordered indices
• Index file is sorted in order of index field
– Hash indices
• Based on uniform distribution of values determined by
function called hash function.
4/29/2021 1:05 PM
Indexing Methods Based on Ordering
• Primary Index
• Clustering Index
• Secondary Index
• Dense Index
• Sparse Index
4/29/2021 1:05 PM
How Index are stored?
• Ordered File with two fields (Key, Pointer)
– First Field (Key) : value of field used for indexing
– Second Field: A block or record pointer
4/29/2021 1:05 PM
Primary Index
• When the ordering of a file is on field which
have a unique value of each record, the index
is known as primary index.
• Primary Index can be characterized as
– Dense
– Sparse
4/29/2021 1:05 PM
Clustering Index
• When the ordering of a file is on field which does
not have a distinct value of each record, the index
is known as clustered index.
• It is also a non dense index.
• When you create a table with a primary key or
unique key, automatically creates a special index
named PRIMARY. This index is called the clustered
index.
4/29/2021 1:05 PM
Secondary Index
• May be on the field which is a candidate key
or a non key with duplicate values
• There can be many secondary indexes for the
same file.
• It is a dense index.
4/29/2021 1:05 PM
Primary Index ….contd…
• A DENSE INDEX has an index entry for every
search key value (every record)
4/29/2021 1:05 PM
Primary Index ….contd…
• A SPARSE INDEX (nondense) has entries for
only some of the search values.
4/29/2021 1:05 PM
Problems with simple ordered indexes
that are kept in disk
• Searching the index is still not fast (binary
searching):
– We do not want more than 3 to 4 comparisons
for a search
• Insertions and deletions of index is expensive
– Index file is sorted
4/29/2021 1:05 PM
SOLUTION
• Multilevel Indexing
4/29/2021 1:05 PM
Multilevel Indexing
• Creating an index of an index file is called
multilevel indexing.
• How?
– Build a simple index for the file, as a sorted file with a
distinct value for each key (First or Base Level)
– Build an primary index for this index
– Build another index for the previous index
– Continue the index-building process until we get
single block called the top index level
4/29/2021 1:05 PM
4/29/2021 1:05 PM
… contd…
• Multilevel indexing is implemented using a
variation of the B tree data structure, called a
B+ tree
4/29/2021 1:05 PM
Example B+Tree
4/29/2021 1:05 PM
Hashed File Organization
4/29/2021 1:05 PM
What it is?
• In a hashed file organization, address of each
record is determined using hashing algorithm.
• Provide a function h, called a hash function,
which is applied to the hash field value (key)
of a record and computes the address of the
disk block (BUCKET)in which the record is
stored.
4/29/2021 1:05 PM
Types of Hashing
• Static Hashing
• Dynamic Hashing
4/29/2021 1:05 PM
Static Hashing
• Uses hash functions in which the set of bucket
address is fixed.
• Hashing Function
– Division Method
– Mid Square Method
– Folding Method etc
4/29/2021 1:05 PM
Collision Resolution
• A collision occurs when the hash field value of
a new record that is being inserted hashes to
an address that already contains a different
record.
• The process of finding another position is
called collision resolution.
4/29/2021 1:05 PM
How Hashing is done?
4/29/2021 1:05 PM
Dynamic Hashing
• Some hashing techniques allow the hash
function to be modified dynamically to
accommodate the growth or shrinkage of the
database.
4/29/2021 1:05 PM
Extendable Hashing
• We choose a hash function that is uniform and
random. It generates values over a relatively
large range.
• The hash addresses in the address space (i.e.
the range) are represented by d-bit binary
integers (typically d = 32). As a result, we can
have a maximum of 232 (over 4 billion)
buckets.
4/29/2021 1:05 PM
• Buckets are not created buckets at once.
• Create them on demand, depending on the size
of the file.
• According to the actual number of buckets
created, we use the corresponding number of
bits to represent their address.
• For example, if there are four buckets at the if
there are four buckets at the moment, we just
need 2 bits for the addresses (i.e. 00, 01, 10 and
11).
4/29/2021 1:05 PM

File Organization

  • 1.
    Data Storage andBasic File Structure Ms. Amrit Kaur 4/29/2021 1:05 PM
  • 2.
    • Databases consistof large amount of data that are stored permanently on magnetic disk. • Database applications need only a small portion of database at a time for processing. – Data from the disk is copied to main memory for processing and rewritten to the disk if the data is changed. 4/29/2021 1:05 PM
  • 3.
    Data Files • Thedata on the disk is physically stored as files of records. • A data file is a sequence of records 4/29/2021 1:05 PM
  • 4.
    Records and RecordTypes • A record is a collection of related data values or items that corresponds to a particular field. – Record describes a particular entity, their attributes, and their relationships. • Types of Records – Fixed length records – Variable length records 4/29/2021 1:05 PM
  • 5.
    Records and RecordTypes • Fixed length record – When ALL record in a file has exactly the same size in bytes – Every record has same fields and field lengths are fixed. – Example: • CREATE TABLE student (rno char(3), name char(15), city char (15)); 1 char occupies 1 bytes Total Record Size = 3 + 15+ 15 = 33 bytes 4/29/2021 1:05 PM 1.. Amrit.......... Delhi………. 33 2.. Dj…………. Chennai…….. 33 12. Jaspreet……. Goa………… 33 123 Jasmeet…….. Delhi………. 33 3 bytes 15 bytes 15 bytes
  • 6.
    Records and RecordTypes • Variable length record – When different records in the file have different sizes – Example: • CREATE TABLE student (rno varchar(3), name varchar(15), city varchar (15)); 4/29/2021 1:05 PM 1 Amrit Delhi 11 2 Dj Chennai 10 12 Jaspreet Goa 13 123 Jasmeet Delhi 15
  • 7.
    Record and RecordTypes • Reasons of having variable length records – Record types • that allow variable length for one or more fields. • One or more fields are optional – File having records of different record types – One or more fields have multiple values for individual records 4/29/2021 1:05 PM
  • 8.
  • 9.
    What is FileOrganization? • A file organization simply means organization of records in files. • A file organization is defined as a technique to determine – how the file records are physically arranged on the disk and – how the records can be accessed 4/29/2021 1:05 PM
  • 10.
    Need of FileOrganization • Fast data retrieval • Efficient use of storage space • Protection from failure or data loss • Minimizing need for reorganization • Security from unauthorized user 4/29/2021 1:05 PM
  • 11.
    Types of FileOrganization • Heap File Organization • Sequential File Organization • Indexed File Organization • Hashing File Organization 4/29/2021 1:05 PM
  • 12.
    Heap File Organization 4/29/20211:05 PM • Records (data) is stored in the file in the order in which they are inserted 217 Sita Delhi 101 Ramesh Chennai 215 Gita Chennai 102 Mina Mumbai 201 Suresh Delhi 218 Mina Chennai 222 Ram Chennai 305 Robin Mumbai 220 Amrit Delhi Student (RollNumber, Name, City)
  • 13.
    Heap File Organization •Also called pile file or Non Sequential Organization . • Operations – Insertion at the end of the file, so very efficient – Retrieval in order of the values of field requires external sorting. – Searching involves Linear search through a file, so searching is slow – Deletion leaves unused space and requires periodic reorganization…time conmunsimg and not effective 4/29/2021 1:05 PM
  • 14.
    Sequential Data File 4/29/20211:05 PM • A records(data) in the file are stored in sequence according to the value of search key and / or primary key of each record. 101 Ramesh Chennai 201 Suresh Delhi 210 Joy Mumabi 215 Gita Chennai 217 Sita Delhi 218 Mina Chennai 222 Ram Chennai 305 Robin Mumbai Student (RollNumber, Name, City)
  • 15.
    Sequential File Organization •Operations – Retrieval is efficient because no sorting is required – Searching involves Binary search through a file, so moderate speed – Insertion and deletion are expensive and time consuming because requires reordering and rewriting 4/29/2021 1:05 PM
  • 16.
    Indexed File Organization •Two files – Data File: table data (.myd) – Index File: index of data (.myi) 4/29/2021 1:05 PM
  • 17.
    What it is? •In data file, records are stored either sequentially or non sequentially and • Index File is created that allow application to locate individual records. 4/29/2021 1:05 PM
  • 18.
    What is Index? •An index is a table used to determine the location of records in a file. • Index speed up the retrieval of records w.r.t. search conditions. • Any field (column) of the file can be used to create an index and known as index field. • Multiple indexes on different fields can be constructed 4/29/2021 1:05 PM
  • 19.
    …. Contd… • Typesof Index – Ordered indices • Index file is sorted in order of index field – Hash indices • Based on uniform distribution of values determined by function called hash function. 4/29/2021 1:05 PM
  • 20.
    Indexing Methods Basedon Ordering • Primary Index • Clustering Index • Secondary Index • Dense Index • Sparse Index 4/29/2021 1:05 PM
  • 21.
    How Index arestored? • Ordered File with two fields (Key, Pointer) – First Field (Key) : value of field used for indexing – Second Field: A block or record pointer 4/29/2021 1:05 PM
  • 22.
    Primary Index • Whenthe ordering of a file is on field which have a unique value of each record, the index is known as primary index. • Primary Index can be characterized as – Dense – Sparse 4/29/2021 1:05 PM
  • 23.
    Clustering Index • Whenthe ordering of a file is on field which does not have a distinct value of each record, the index is known as clustered index. • It is also a non dense index. • When you create a table with a primary key or unique key, automatically creates a special index named PRIMARY. This index is called the clustered index. 4/29/2021 1:05 PM
  • 24.
    Secondary Index • Maybe on the field which is a candidate key or a non key with duplicate values • There can be many secondary indexes for the same file. • It is a dense index. 4/29/2021 1:05 PM
  • 25.
    Primary Index ….contd… •A DENSE INDEX has an index entry for every search key value (every record) 4/29/2021 1:05 PM
  • 26.
    Primary Index ….contd… •A SPARSE INDEX (nondense) has entries for only some of the search values. 4/29/2021 1:05 PM
  • 27.
    Problems with simpleordered indexes that are kept in disk • Searching the index is still not fast (binary searching): – We do not want more than 3 to 4 comparisons for a search • Insertions and deletions of index is expensive – Index file is sorted 4/29/2021 1:05 PM
  • 28.
  • 29.
    Multilevel Indexing • Creatingan index of an index file is called multilevel indexing. • How? – Build a simple index for the file, as a sorted file with a distinct value for each key (First or Base Level) – Build an primary index for this index – Build another index for the previous index – Continue the index-building process until we get single block called the top index level 4/29/2021 1:05 PM
  • 30.
  • 31.
    … contd… • Multilevelindexing is implemented using a variation of the B tree data structure, called a B+ tree 4/29/2021 1:05 PM
  • 32.
  • 33.
  • 34.
    What it is? •In a hashed file organization, address of each record is determined using hashing algorithm. • Provide a function h, called a hash function, which is applied to the hash field value (key) of a record and computes the address of the disk block (BUCKET)in which the record is stored. 4/29/2021 1:05 PM
  • 35.
    Types of Hashing •Static Hashing • Dynamic Hashing 4/29/2021 1:05 PM
  • 36.
    Static Hashing • Useshash functions in which the set of bucket address is fixed. • Hashing Function – Division Method – Mid Square Method – Folding Method etc 4/29/2021 1:05 PM
  • 37.
    Collision Resolution • Acollision occurs when the hash field value of a new record that is being inserted hashes to an address that already contains a different record. • The process of finding another position is called collision resolution. 4/29/2021 1:05 PM
  • 38.
    How Hashing isdone? 4/29/2021 1:05 PM
  • 39.
    Dynamic Hashing • Somehashing techniques allow the hash function to be modified dynamically to accommodate the growth or shrinkage of the database. 4/29/2021 1:05 PM
  • 40.
    Extendable Hashing • Wechoose a hash function that is uniform and random. It generates values over a relatively large range. • The hash addresses in the address space (i.e. the range) are represented by d-bit binary integers (typically d = 32). As a result, we can have a maximum of 232 (over 4 billion) buckets. 4/29/2021 1:05 PM
  • 41.
    • Buckets arenot created buckets at once. • Create them on demand, depending on the size of the file. • According to the actual number of buckets created, we use the corresponding number of bits to represent their address. • For example, if there are four buckets at the if there are four buckets at the moment, we just need 2 bits for the addresses (i.e. 00, 01, 10 and 11). 4/29/2021 1:05 PM