SlideShare a Scribd company logo
RELATIONAL DATABASE MANAGEMENT SYSTEM
UNIT – III
DATA STORAGE AND INDEXING
1
Storage And File Structure
Why do we need to know about storage/file structure
Many database technologies are developed to utilize the
Storage architecture/hierarchy
Data in the database needs to be organized and
stored/retrieved efficiently
Storage Hierarchy
Magnetic Tape
Optical Disk
Magnetic Disk
Flash Memory
Cache
unit priceMemory
Volatile
primary storage
Non-volatile speed
Secondary
storage
Tertiary
storage
Primary Storage (Volatile)
Cache
Speed: 7 to 20 ns (1 nanosecond = 10–9 seconds)
Capacity:
A typical PC level 2 cache 64KB-2 MB.
Within processors, level 1 cache usually ranges in size from 8
KB to 64 KB.
Main memory
Speed: 10s to 100s of nanoseconds;
Capacity:
Up to a few Gigabytes widely used currently
per-byte costs have decreased roughly factor of 2 every 2 3
years)
Secondary Storage (Non-volatile)
Flash memory
Speed: Read speed similar to main memory, write is much slower
Capacity: 32M to 512M currently
Forms: SmartMedia, memory stick, secure digital, BIOS
Cost: roughly same as main memory
Magnetic-disk
Capacities: up to roughly 100 GB currently Growing constantly
and rapidly with technology improvements.
1/14/2005
Yan Huang - CSCI5330 Database Implementation –
Storage and File Structure
Tertiary Storage (Non-volatile)
 Optical storage
 CD-ROM (640 MB) and DVD (4.7 to 17 GB) most popular
forms
 CD-RW, DVD-RW, and DVD-RAM
 Reads and writes are slower than with magnetic disk
 Juke-box systems, with large numbers of removable disks,
a few drives, and a mechanism for automatic
loading/unloading of disks available for storing large
volumes of data
Indexing:
*Indexing in database systems is similar to what we see
in books.
* Indexing is a data structure technique to efficiently
retrieve records from the database files based on some
attributes on which the indexing has been done.
*Indexing is defined based on its indexing attributes.
Indexing can be of the following types
*Primary Index - Primary index is defined on an
ordered data file. The data file is ordered on a key field. The
key field is generally the primary key of the relation.
Indexing Types:
Secondary Index - Secondary index may be
generated from a field which is a candidate key and has a
unique value in every record, or a non-key with duplicate
values.
Clustering Index - Clustering index is defined on an
ordered data file. The data file is ordered on a non-key field.
Ordered Indexing.
Dense Index - In dense index, there is an index
record for every search key value in the database. This makes
searching faster but requires more space to store index
records itself. Index records contain search key value and a
pointer to the actual record on the disk.
Sparse Index:
In sparse index, index records are not created for
every search key. An index record here contains a search key
and an actual pointer to the data on the disk.
To search a record, we first proceed by index record
and reach at the actual location of the data.
If the data we are looking for is not where we
directly reach by following the index, then the system starts
sequential search until the desired data is found.
Multilevels Indexing:
Index records comprise search-key values and data pointers.
Multilevel index is stored on the disk along with the actual
database files.
As the size of the database grows, so does the size of the
indices.
If single-level index is used, then a large size index cannot be
kept in memory which leads to multiple disk accesses .
Multi-level Index helps in breaking down the index into
several smaller indices in order to make the outermost level so small
that it can be saved in a single disk block,
Disk:
Hard disk drives are the most common secondary storage
devices in present computer systems. These are called magnetic disks
because they use the concept of magnetization to store information.
Hard disks are formatted in a well-defined order to store data
efficiently. A hard disk plate has many concentric circles on it,
called tracks. Every track is further divided into sectors. A sector on
a hard disk typically stores 512 bytes of data.
Disk
Disk Subsystem
Disk interface standards families
• ATA (AT adaptor) range of standards
• SCSI (Small Computer System Interconnect) range
of standards.
Disk Speed
Seek time
(milliseconds)
Rotation time/latency
milliseconds
Data-transfer rate
(4-8MB/sec)
Typical numbers:
 16,000 tracks per platter
 sectors per track: 200 – 400
 512 bytes per sector
 4-16KB per block
 5,400 - 15,000 r p m
Access time = seek time + latency
Discuss ways to improve disk
reading speed
Redundant Array of Independent Disks
RAID or Redundant Array of Independent Disks,
is a technology to connect multiple secondary storage
devices and use them as a single storage media.
RAID consists of an array of disks in which
multiple disks are connected together to achieve different
goals. RAID levels define the use of disk arrays.
RAID 0
In this level, a striped array of disks is implemented. The
data is broken down into blocks and the blocks are distributed
among disks. Each disk receives a block of data to write/read in
parallel. It enhances the speed and performance of the storage
device. There is no parity and backup in Level 0.
RAID1
RAID 1 uses mirroring techniques. When data is sent to
a RAID controller, it sends a copy of data to all the disks in the
array. RAID level 1 is also called mirroring and provides
100% redundancy in case of a failure.
RAID 2
RAID 2 records Error Correction Code using Hamming
distance for its data, striped on different disks. Like level 0, each
data bit in a word is recorded on a separate disk and ECC codes of
the data words are stored on a different set disks. Due to its
complex structure and high cost, RAID 2 is not commercially
available.
RAID3
RAID 3 stripes the data onto multiple disks. The parity
bit generated for data word is stored on a different disk. This
technique makes it to overcome single disk failures.
RAID 4
In this level, an entire block of data is written onto data
disks and then the parity is generated and stored on a different
disk. Note that level 3 uses byte-level striping, whereas level 4
uses block-level striping. Both level 3 and level 4 require at
least three disks to implement RAID.
RAID 5
RAID 5 writes whole data blocks onto different disks, but
the parity bits generated for data block stripe are distributed
among all the data disks rather than storing them on a different
dedicated disk.
RAID 6
RAID 6 is an extension of level 5. In this level, two
independent parities are generated and stored in distributed fashion
among multiple disks. Two parities provide additional fault
tolerance. This level requires at least four disk drives to implement
RAID.
File Organization
File Organization defines how file records are mapped onto
disk blocks. We have four types of File Organization to organize
file records
Heap File Organization
When a file is created using Heap File Organization, the
Operating System allocates memory area to that file without
any further accounting details.
File records can be placed anywhere in that memory
area.
It is the responsibility of the software to manage the
records.
Heap File does not support any ordering, sequencing, or
indexing on its own.
Sequential File Organization
Every file record contains a data field (attribute) to uniquely
identify that record.
In sequential file organization, records are placed in the file
in some sequential order based on the unique key field or search
key.
Practically, it is not possible to store all the records
sequentially in physical form.
Hash File Organization
Hash File Organization uses Hash function computation
on some fields of the records.
The output of the hash function determines the location
of disk block where the records are to be placed.
Clustered File Organization
Clustered file organization is not considered good
for large databases. In this mechanism, related records from
one or more relations are kept in the same disk block, that
is, the ordering of records is not based on primary key or
search key.
Hashing:
Hashing uses hash functions with search keys as
parameters to generate the address of a data record.
Bucket A hash file stores data in bucket
format. Bucket is considered a unit of storage. A bucket
typically stores one complete disk block, which in turn
can store one or more records.
A hash function, h, is a mapping function that
maps all the set of search-keys K to the address where
actual records are placed. It is a function from search
keys to bucket addresses.
B+ Tree:
B+ tree is a (key, value) storage method in a tree like
structure. B+ tree has one root, any number of intermediary
nodes (usually one) and a leaf node. Here all leaf nodes will
have the actual records stored. Intermediary nodes will have
only pointers to the leaf nodes; it not has any data. Any node
will have only two leaves. This is the basic of any B+ tree.
STRUCTURE OF B+ TREE
 A B+ tree index is a multilevel indexes , but it has a structure that differs from than of
the multilevel index-sequential file.
 The bucket structure is used only if the search key does not from a primary key and if
the file is not sorted in the search key value in the order.
QUERIES ON B+ TREE
 Process queries using a b+ tree . To find all the records with a search-key
value of k.
 Leaf nodes must have between 2 and 4 values([(n-1)/2)] and n-1 , with
n=5).
 Non-leaf nodes other than root must have between 3 and 5
children([(n/2)]and n with n=5).
 Root must have at least 2 children.
UPADATES ON B+ TREES
INSERTION
If the search key value already appears in the leaf node , we add the new
record to the file and , if necessary , a pointer to the bucket.
DELETION
Using same technique as for lookup , we find the record to be deleted and
remove it from the file . The search key value is removed from the leaf node
if there is no bucket associated with that search key value or if the bucket
becomes empty as a result of the deletion.
B+TREE FILE ORGANIZATION
In a B+ tree file organization , the leaf nodes of the tree store records
instead of storing pointers to records . An example of a B+ tree file
organization . Since records are usually larger than pointers , the maximum
number of records that can be stored in the leaf nodes is less than the
number of pointers in a non leaf node.
MAIN GOAL OF B+ TREE IS:
 Sorted Intermediary and leaf nodes
Since it is a balanced tree, all nodes should be sorted.
 Fast traversal and Quick Search
Any record should be fetched very quickly. This is made by maintaining the
balance in the tree and keeping all the nodes at same distance
 No overflow pages
B+ tree allows all the intermediary and leaf nodes to be partially filled – it will have
some percentage defined while designing a B+ tree. In our example above,
intermediary node with 108 is underflow. And leaf nodes are not partially filled,
hence it is an overflow. In ideal B+ tree, it should not have overflow or underflow
except root node.
Definition of a B-tree
A B-tree of order m is an m-way tree (i.e., a tree where each node
may have up to m children) in which:
The number of keys in each non-leaf node is one less than the
number of its children and these keys partition the keys in the
children in the fashion of a search tree.
All leaves are on the same level.
All non-leaf nodes except the root have at least m / 2 children.
The root is either a leaf node, or it has from two to m children
a leaf node contains no more than m – 1 keys.
The number m should always be odd.
An example B-Tree
B-Trees 41
51 6242
6 12
26
55 60 7064 9045
1 2 4 7 8 13 15 18 25
27 29 46 48 53
A B-tree of order 5
containing 26 items
Note that all the leaves are at the same level
Constructing a B-tree
 Suppose we start with an empty B-tree and keys arrive in the
following order:1 12 8 2 25 5 14 28 17 7 52 16 48 68
3 26 29 53 55 45
 We want to construct a B-tree of order 5
 The first four items go into the root:
 To put the fifth item in the root would violate condition 5
 Therefore, when 25 arrives, pick the middle key to make a new
root
B-Trees 42
1 2 8 12
Inserting into a B-Tree
 Attempt to insert the new key into a leaf
 If this would result in that leaf becoming too big, split the leaf
into two, promoting the middle key to the leaf’s parent
 If this would result in the parent becoming too big, split the
parent into two, promoting the middle key
 This strategy might have to be repeated all the way to the top
 If necessary, the root is split in two and the middle key is
promoted to a new root, making the tree one level higher
B-Trees 43
Removal from a B-tree
 During insertion, the key always goes into a leaf. For deletion
we wish to remove from a leaf. There are three possible ways
we can do this:
 1 - If the key is already in a leaf node, and removing it doesn’t
cause that leaf node to have too few keys, then simply remove
the key to be deleted.
 2 - If the key is not in a leaf then it is guaranteed (by the nature
of a B-tree) that its predecessor or successor will be in a leaf --
in this case we can delete the key and promote the predecessor
or successor key to the non-leaf deleted key’s position.
B-Trees 44
Analysis of B-Trees
 The maximum number of items in a B-tree of order m and
height h:
root m – 1
level 1 m(m – 1)
level 2 m2(m – 1)
. . .
level h mh(m – 1)
 So, the total number of items is
(1 + m + m2 + m3 + … + mh)(m – 1) =
[(mh+1 – 1)/ (m – 1)] (m – 1) = mh+1 – 1
 When m = 5 and h = 2 this gives 53 – 1 = 124
B-Trees 45
Static Hashing:
In static hashing, when a search-key value is provided, the
hash function always computes the same address.
For example, if mod-4 hash function is used, then it shall
generate only 5 values.
The output address shall always be same for that function.
The number of buckets provided remains unchanged at all times
Operation
When a record is required to be entered using static hash,
the hash function h computes the bucket address for search key K,
where the record will be stored.
Bucket address = h(K)
Search − When a record needs to be
retrieved, the same hash function can be used to
retrieve the address of the bucket where the data is
stored.
Delete − This is simply a search followed by a
deletion operation.
Dynamic Hashing
The problem with static hashing is that it does not
expand or shrink dynamically as the size of the database grows
or shrinks.
Dynamic hashing provides a mechanism in which data
buckets are added and removed dynamically and ondemand.
Dynamic hashing is also known as extended hashing.
Hash function, in dynamic hashing, is made to produce
a large number of values and only a few are used initially.
Multiple-Key Access
Use multiple indices for certain types of queries.
Example:
select ID
from instructor
where dept_name = “Finance” and salary = 80000
Possible strategies for processing query using indices on
single attributes:
Multiple Key Access
1. Use index on dept_name to find instructors with
department name Finance; test salary = 80000
2. Use index on salary to find instructors with a salary
of $80000; test dept_name = “Finance”.
3. Use dept_name index to find pointers to all records
pertaining to the “Finance” department.
Similarly use index on salary. Take
intersection of both sets of pointers obtained.
Data storage and indexing

More Related Content

What's hot

Database , 8 Query Optimization
Database , 8 Query OptimizationDatabase , 8 Query Optimization
Database , 8 Query OptimizationAli Usman
 
Overview of Storage and Indexing ...
Overview of Storage and Indexing                                             ...Overview of Storage and Indexing                                             ...
Overview of Storage and Indexing ...
Javed Khan
 
Directory structure
Directory structureDirectory structure
Directory structure
sangrampatil81
 
RAID
RAIDRAID
Chapter 11 - File System Implementation
Chapter 11 - File System ImplementationChapter 11 - File System Implementation
Chapter 11 - File System Implementation
Wayne Jones Jnr
 
RAID LEVELS
RAID LEVELSRAID LEVELS
RAID LEVELS
Uzair Khan
 
2 phase locking protocol DBMS
2 phase locking protocol DBMS2 phase locking protocol DBMS
2 phase locking protocol DBMS
Dhananjaysinh Jhala
 
Distributed DBMS - Unit 1 - Introduction
Distributed DBMS - Unit 1 - IntroductionDistributed DBMS - Unit 1 - Introduction
Distributed DBMS - Unit 1 - Introduction
Gyanmanjari Institute Of Technology
 
Shadow paging
Shadow pagingShadow paging
Shadow paging
GowriLatha1
 
The Relational Database Model
The Relational Database ModelThe Relational Database Model
The Relational Database Model
Shishir Aryal
 
Indexing and Hashing
Indexing and HashingIndexing and Hashing
Indexing and Hashing
sathish sak
 
Isam
IsamIsam
File system Os
File system OsFile system Os
File system Os
Nehal Naik
 
DBMS Unit - 6 - Transaction Management
DBMS Unit - 6 - Transaction ManagementDBMS Unit - 6 - Transaction Management
DBMS Unit - 6 - Transaction Management
Gyanmanjari Institute Of Technology
 
trees in data structure
trees in data structure trees in data structure
trees in data structure
shameen khan
 
11. Storage and File Structure in DBMS
11. Storage and File Structure in DBMS11. Storage and File Structure in DBMS
11. Storage and File Structure in DBMSkoolkampus
 
Data partitioning
Data partitioningData partitioning
Data partitioning
Vinod Wilson
 

What's hot (20)

Database , 8 Query Optimization
Database , 8 Query OptimizationDatabase , 8 Query Optimization
Database , 8 Query Optimization
 
Overview of Storage and Indexing ...
Overview of Storage and Indexing                                             ...Overview of Storage and Indexing                                             ...
Overview of Storage and Indexing ...
 
Directory structure
Directory structureDirectory structure
Directory structure
 
RAID
RAIDRAID
RAID
 
Chapter 11 - File System Implementation
Chapter 11 - File System ImplementationChapter 11 - File System Implementation
Chapter 11 - File System Implementation
 
RAID LEVELS
RAID LEVELSRAID LEVELS
RAID LEVELS
 
2 phase locking protocol DBMS
2 phase locking protocol DBMS2 phase locking protocol DBMS
2 phase locking protocol DBMS
 
File organization
File organizationFile organization
File organization
 
Distributed DBMS - Unit 1 - Introduction
Distributed DBMS - Unit 1 - IntroductionDistributed DBMS - Unit 1 - Introduction
Distributed DBMS - Unit 1 - Introduction
 
Shadow paging
Shadow pagingShadow paging
Shadow paging
 
The Relational Database Model
The Relational Database ModelThe Relational Database Model
The Relational Database Model
 
Indexing and Hashing
Indexing and HashingIndexing and Hashing
Indexing and Hashing
 
Isam
IsamIsam
Isam
 
File system Os
File system OsFile system Os
File system Os
 
DBMS Unit - 6 - Transaction Management
DBMS Unit - 6 - Transaction ManagementDBMS Unit - 6 - Transaction Management
DBMS Unit - 6 - Transaction Management
 
trees in data structure
trees in data structure trees in data structure
trees in data structure
 
11. Storage and File Structure in DBMS
11. Storage and File Structure in DBMS11. Storage and File Structure in DBMS
11. Storage and File Structure in DBMS
 
File organisation
File organisationFile organisation
File organisation
 
Data partitioning
Data partitioningData partitioning
Data partitioning
 
Databases: Normalisation
Databases: NormalisationDatabases: Normalisation
Databases: Normalisation
 

Similar to Data storage and indexing

Unit 4 data storage and querying
Unit 4   data storage and queryingUnit 4   data storage and querying
Unit 4 data storage and querying
Ravindran Kannan
 
DBMS-Unit5-PPT.pptx important for revision
DBMS-Unit5-PPT.pptx important for revisionDBMS-Unit5-PPT.pptx important for revision
DBMS-Unit5-PPT.pptx important for revision
yuvivarmaa
 
3620121datastructures.ppt
3620121datastructures.ppt3620121datastructures.ppt
3620121datastructures.ppt
SheejamolMathew
 
UNIT III.pptx
UNIT III.pptxUNIT III.pptx
UNIT III.pptx
NIVETHA37590
 
CS 2212- UNIT -4.pptx
CS 2212-  UNIT -4.pptxCS 2212-  UNIT -4.pptx
CS 2212- UNIT -4.pptx
LilyMkayula
 
lecture 2 notes indexing in application of database systems.pptx
lecture 2 notes indexing in application of database systems.pptxlecture 2 notes indexing in application of database systems.pptx
lecture 2 notes indexing in application of database systems.pptx
peter1097
 
Chapter13
Chapter13Chapter13
Chapter13
gourab87
 
Csci12 report aug18
Csci12 report aug18Csci12 report aug18
Csci12 report aug18
karenostil
 
Indexing and hashing
Indexing and hashingIndexing and hashing
Indexing and hashing
Abdul mannan Karim
 
overview of storage and indexing BY-Pratik kadam
overview of storage and indexing BY-Pratik kadam overview of storage and indexing BY-Pratik kadam
overview of storage and indexing BY-Pratik kadam
pratikkadam78
 
File organization and introduction of DBMS
File organization and introduction of DBMSFile organization and introduction of DBMS
File organization and introduction of DBMS
VrushaliSolanke
 
fileorganizationandintroductionofdbms-210313163900.pdf
fileorganizationandintroductionofdbms-210313163900.pdffileorganizationandintroductionofdbms-210313163900.pdf
fileorganizationandintroductionofdbms-210313163900.pdf
FraolUmeta
 
DBMS (UNIT 5)
DBMS (UNIT 5)DBMS (UNIT 5)
DBMS (UNIT 5)
SURBHI SAROHA
 
What is Object storage ?
What is Object storage ?What is Object storage ?
What is Object storage ?
Nabil Kassi
 
Database management system session 6
Database management system session 6Database management system session 6
Database management system session 6
Infinity Tech Solutions
 
File Structure.pptx
File Structure.pptxFile Structure.pptx
File Structure.pptx
zedd15
 

Similar to Data storage and indexing (20)

Unit 4 data storage and querying
Unit 4   data storage and queryingUnit 4   data storage and querying
Unit 4 data storage and querying
 
DBMS-Unit5-PPT.pptx important for revision
DBMS-Unit5-PPT.pptx important for revisionDBMS-Unit5-PPT.pptx important for revision
DBMS-Unit5-PPT.pptx important for revision
 
3620121datastructures.ppt
3620121datastructures.ppt3620121datastructures.ppt
3620121datastructures.ppt
 
UNIT III.pptx
UNIT III.pptxUNIT III.pptx
UNIT III.pptx
 
Storage struct
Storage structStorage struct
Storage struct
 
CS 2212- UNIT -4.pptx
CS 2212-  UNIT -4.pptxCS 2212-  UNIT -4.pptx
CS 2212- UNIT -4.pptx
 
lecture 2 notes indexing in application of database systems.pptx
lecture 2 notes indexing in application of database systems.pptxlecture 2 notes indexing in application of database systems.pptx
lecture 2 notes indexing in application of database systems.pptx
 
Chapter13
Chapter13Chapter13
Chapter13
 
Csci12 report aug18
Csci12 report aug18Csci12 report aug18
Csci12 report aug18
 
Indexing and hashing
Indexing and hashingIndexing and hashing
Indexing and hashing
 
Unit 08 dbms
Unit 08 dbmsUnit 08 dbms
Unit 08 dbms
 
overview of storage and indexing BY-Pratik kadam
overview of storage and indexing BY-Pratik kadam overview of storage and indexing BY-Pratik kadam
overview of storage and indexing BY-Pratik kadam
 
File organization and introduction of DBMS
File organization and introduction of DBMSFile organization and introduction of DBMS
File organization and introduction of DBMS
 
fileorganizationandintroductionofdbms-210313163900.pdf
fileorganizationandintroductionofdbms-210313163900.pdffileorganizationandintroductionofdbms-210313163900.pdf
fileorganizationandintroductionofdbms-210313163900.pdf
 
DBMS (UNIT 5)
DBMS (UNIT 5)DBMS (UNIT 5)
DBMS (UNIT 5)
 
Unit08 dbms
Unit08 dbmsUnit08 dbms
Unit08 dbms
 
What is Object storage ?
What is Object storage ?What is Object storage ?
What is Object storage ?
 
Database management system session 6
Database management system session 6Database management system session 6
Database management system session 6
 
Ardbms
ArdbmsArdbms
Ardbms
 
File Structure.pptx
File Structure.pptxFile Structure.pptx
File Structure.pptx
 

More from pradeepa velmurugan

Multimedia compression
Multimedia compressionMultimedia compression
Multimedia compression
pradeepa velmurugan
 
software design
software designsoftware design
software design
pradeepa velmurugan
 
File handling in input and output
File handling in input and outputFile handling in input and output
File handling in input and output
pradeepa velmurugan
 
Analysis Of Attribute Revelance
Analysis Of Attribute RevelanceAnalysis Of Attribute Revelance
Analysis Of Attribute Revelance
pradeepa velmurugan
 
Scheduling
SchedulingScheduling
Instruction codes
Instruction codesInstruction codes
Instruction codes
pradeepa velmurugan
 
Research Methodology
Research MethodologyResearch Methodology
Research Methodology
pradeepa velmurugan
 

More from pradeepa velmurugan (10)

FIREWALL
FIREWALLFIREWALL
FIREWALL
 
Multimedia compression
Multimedia compressionMultimedia compression
Multimedia compression
 
software design
software designsoftware design
software design
 
DIVIDE AND CONQUER
DIVIDE AND CONQUERDIVIDE AND CONQUER
DIVIDE AND CONQUER
 
IMAGE COMPRESSION
IMAGE COMPRESSIONIMAGE COMPRESSION
IMAGE COMPRESSION
 
File handling in input and output
File handling in input and outputFile handling in input and output
File handling in input and output
 
Analysis Of Attribute Revelance
Analysis Of Attribute RevelanceAnalysis Of Attribute Revelance
Analysis Of Attribute Revelance
 
Scheduling
SchedulingScheduling
Scheduling
 
Instruction codes
Instruction codesInstruction codes
Instruction codes
 
Research Methodology
Research MethodologyResearch Methodology
Research Methodology
 

Recently uploaded

Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
Vivekanand Anglo Vedic Academy
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
Nguyen Thanh Tu Collection
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
Col Mukteshwar Prasad
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 

Recently uploaded (20)

Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 

Data storage and indexing

  • 1. RELATIONAL DATABASE MANAGEMENT SYSTEM UNIT – III DATA STORAGE AND INDEXING
  • 2. 1 Storage And File Structure Why do we need to know about storage/file structure Many database technologies are developed to utilize the Storage architecture/hierarchy Data in the database needs to be organized and stored/retrieved efficiently
  • 3. Storage Hierarchy Magnetic Tape Optical Disk Magnetic Disk Flash Memory Cache unit priceMemory Volatile primary storage Non-volatile speed Secondary storage Tertiary storage
  • 4. Primary Storage (Volatile) Cache Speed: 7 to 20 ns (1 nanosecond = 10–9 seconds) Capacity: A typical PC level 2 cache 64KB-2 MB. Within processors, level 1 cache usually ranges in size from 8 KB to 64 KB. Main memory Speed: 10s to 100s of nanoseconds; Capacity: Up to a few Gigabytes widely used currently per-byte costs have decreased roughly factor of 2 every 2 3 years)
  • 5. Secondary Storage (Non-volatile) Flash memory Speed: Read speed similar to main memory, write is much slower Capacity: 32M to 512M currently Forms: SmartMedia, memory stick, secure digital, BIOS Cost: roughly same as main memory Magnetic-disk Capacities: up to roughly 100 GB currently Growing constantly and rapidly with technology improvements.
  • 6. 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Tertiary Storage (Non-volatile)  Optical storage  CD-ROM (640 MB) and DVD (4.7 to 17 GB) most popular forms  CD-RW, DVD-RW, and DVD-RAM  Reads and writes are slower than with magnetic disk  Juke-box systems, with large numbers of removable disks, a few drives, and a mechanism for automatic loading/unloading of disks available for storing large volumes of data
  • 7. Indexing: *Indexing in database systems is similar to what we see in books. * Indexing is a data structure technique to efficiently retrieve records from the database files based on some attributes on which the indexing has been done. *Indexing is defined based on its indexing attributes. Indexing can be of the following types *Primary Index - Primary index is defined on an ordered data file. The data file is ordered on a key field. The key field is generally the primary key of the relation.
  • 8. Indexing Types: Secondary Index - Secondary index may be generated from a field which is a candidate key and has a unique value in every record, or a non-key with duplicate values. Clustering Index - Clustering index is defined on an ordered data file. The data file is ordered on a non-key field. Ordered Indexing. Dense Index - In dense index, there is an index record for every search key value in the database. This makes searching faster but requires more space to store index records itself. Index records contain search key value and a pointer to the actual record on the disk.
  • 9.
  • 10. Sparse Index: In sparse index, index records are not created for every search key. An index record here contains a search key and an actual pointer to the data on the disk. To search a record, we first proceed by index record and reach at the actual location of the data. If the data we are looking for is not where we directly reach by following the index, then the system starts sequential search until the desired data is found.
  • 11.
  • 12. Multilevels Indexing: Index records comprise search-key values and data pointers. Multilevel index is stored on the disk along with the actual database files. As the size of the database grows, so does the size of the indices. If single-level index is used, then a large size index cannot be kept in memory which leads to multiple disk accesses . Multi-level Index helps in breaking down the index into several smaller indices in order to make the outermost level so small that it can be saved in a single disk block,
  • 13.
  • 14. Disk: Hard disk drives are the most common secondary storage devices in present computer systems. These are called magnetic disks because they use the concept of magnetization to store information. Hard disks are formatted in a well-defined order to store data efficiently. A hard disk plate has many concentric circles on it, called tracks. Every track is further divided into sectors. A sector on a hard disk typically stores 512 bytes of data.
  • 15. Disk
  • 16. Disk Subsystem Disk interface standards families • ATA (AT adaptor) range of standards • SCSI (Small Computer System Interconnect) range of standards.
  • 17. Disk Speed Seek time (milliseconds) Rotation time/latency milliseconds Data-transfer rate (4-8MB/sec) Typical numbers:  16,000 tracks per platter  sectors per track: 200 – 400  512 bytes per sector  4-16KB per block  5,400 - 15,000 r p m Access time = seek time + latency Discuss ways to improve disk reading speed
  • 18. Redundant Array of Independent Disks RAID or Redundant Array of Independent Disks, is a technology to connect multiple secondary storage devices and use them as a single storage media. RAID consists of an array of disks in which multiple disks are connected together to achieve different goals. RAID levels define the use of disk arrays.
  • 19. RAID 0 In this level, a striped array of disks is implemented. The data is broken down into blocks and the blocks are distributed among disks. Each disk receives a block of data to write/read in parallel. It enhances the speed and performance of the storage device. There is no parity and backup in Level 0.
  • 20. RAID1 RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a copy of data to all the disks in the array. RAID level 1 is also called mirroring and provides 100% redundancy in case of a failure.
  • 21. RAID 2 RAID 2 records Error Correction Code using Hamming distance for its data, striped on different disks. Like level 0, each data bit in a word is recorded on a separate disk and ECC codes of the data words are stored on a different set disks. Due to its complex structure and high cost, RAID 2 is not commercially available.
  • 22. RAID3 RAID 3 stripes the data onto multiple disks. The parity bit generated for data word is stored on a different disk. This technique makes it to overcome single disk failures.
  • 23. RAID 4 In this level, an entire block of data is written onto data disks and then the parity is generated and stored on a different disk. Note that level 3 uses byte-level striping, whereas level 4 uses block-level striping. Both level 3 and level 4 require at least three disks to implement RAID.
  • 24. RAID 5 RAID 5 writes whole data blocks onto different disks, but the parity bits generated for data block stripe are distributed among all the data disks rather than storing them on a different dedicated disk.
  • 25. RAID 6 RAID 6 is an extension of level 5. In this level, two independent parities are generated and stored in distributed fashion among multiple disks. Two parities provide additional fault tolerance. This level requires at least four disk drives to implement RAID.
  • 26. File Organization File Organization defines how file records are mapped onto disk blocks. We have four types of File Organization to organize file records
  • 27. Heap File Organization When a file is created using Heap File Organization, the Operating System allocates memory area to that file without any further accounting details. File records can be placed anywhere in that memory area. It is the responsibility of the software to manage the records. Heap File does not support any ordering, sequencing, or indexing on its own.
  • 28. Sequential File Organization Every file record contains a data field (attribute) to uniquely identify that record. In sequential file organization, records are placed in the file in some sequential order based on the unique key field or search key. Practically, it is not possible to store all the records sequentially in physical form.
  • 29. Hash File Organization Hash File Organization uses Hash function computation on some fields of the records. The output of the hash function determines the location of disk block where the records are to be placed.
  • 30. Clustered File Organization Clustered file organization is not considered good for large databases. In this mechanism, related records from one or more relations are kept in the same disk block, that is, the ordering of records is not based on primary key or search key.
  • 31. Hashing: Hashing uses hash functions with search keys as parameters to generate the address of a data record. Bucket A hash file stores data in bucket format. Bucket is considered a unit of storage. A bucket typically stores one complete disk block, which in turn can store one or more records. A hash function, h, is a mapping function that maps all the set of search-keys K to the address where actual records are placed. It is a function from search keys to bucket addresses.
  • 32. B+ Tree: B+ tree is a (key, value) storage method in a tree like structure. B+ tree has one root, any number of intermediary nodes (usually one) and a leaf node. Here all leaf nodes will have the actual records stored. Intermediary nodes will have only pointers to the leaf nodes; it not has any data. Any node will have only two leaves. This is the basic of any B+ tree.
  • 33. STRUCTURE OF B+ TREE  A B+ tree index is a multilevel indexes , but it has a structure that differs from than of the multilevel index-sequential file.  The bucket structure is used only if the search key does not from a primary key and if the file is not sorted in the search key value in the order.
  • 34. QUERIES ON B+ TREE  Process queries using a b+ tree . To find all the records with a search-key value of k.
  • 35.  Leaf nodes must have between 2 and 4 values([(n-1)/2)] and n-1 , with n=5).  Non-leaf nodes other than root must have between 3 and 5 children([(n/2)]and n with n=5).  Root must have at least 2 children.
  • 36. UPADATES ON B+ TREES INSERTION If the search key value already appears in the leaf node , we add the new record to the file and , if necessary , a pointer to the bucket.
  • 37. DELETION Using same technique as for lookup , we find the record to be deleted and remove it from the file . The search key value is removed from the leaf node if there is no bucket associated with that search key value or if the bucket becomes empty as a result of the deletion.
  • 38. B+TREE FILE ORGANIZATION In a B+ tree file organization , the leaf nodes of the tree store records instead of storing pointers to records . An example of a B+ tree file organization . Since records are usually larger than pointers , the maximum number of records that can be stored in the leaf nodes is less than the number of pointers in a non leaf node.
  • 39. MAIN GOAL OF B+ TREE IS:  Sorted Intermediary and leaf nodes Since it is a balanced tree, all nodes should be sorted.  Fast traversal and Quick Search Any record should be fetched very quickly. This is made by maintaining the balance in the tree and keeping all the nodes at same distance  No overflow pages B+ tree allows all the intermediary and leaf nodes to be partially filled – it will have some percentage defined while designing a B+ tree. In our example above, intermediary node with 108 is underflow. And leaf nodes are not partially filled, hence it is an overflow. In ideal B+ tree, it should not have overflow or underflow except root node.
  • 40. Definition of a B-tree A B-tree of order m is an m-way tree (i.e., a tree where each node may have up to m children) in which: The number of keys in each non-leaf node is one less than the number of its children and these keys partition the keys in the children in the fashion of a search tree. All leaves are on the same level. All non-leaf nodes except the root have at least m / 2 children. The root is either a leaf node, or it has from two to m children a leaf node contains no more than m – 1 keys. The number m should always be odd.
  • 41. An example B-Tree B-Trees 41 51 6242 6 12 26 55 60 7064 9045 1 2 4 7 8 13 15 18 25 27 29 46 48 53 A B-tree of order 5 containing 26 items Note that all the leaves are at the same level
  • 42. Constructing a B-tree  Suppose we start with an empty B-tree and keys arrive in the following order:1 12 8 2 25 5 14 28 17 7 52 16 48 68 3 26 29 53 55 45  We want to construct a B-tree of order 5  The first four items go into the root:  To put the fifth item in the root would violate condition 5  Therefore, when 25 arrives, pick the middle key to make a new root B-Trees 42 1 2 8 12
  • 43. Inserting into a B-Tree  Attempt to insert the new key into a leaf  If this would result in that leaf becoming too big, split the leaf into two, promoting the middle key to the leaf’s parent  If this would result in the parent becoming too big, split the parent into two, promoting the middle key  This strategy might have to be repeated all the way to the top  If necessary, the root is split in two and the middle key is promoted to a new root, making the tree one level higher B-Trees 43
  • 44. Removal from a B-tree  During insertion, the key always goes into a leaf. For deletion we wish to remove from a leaf. There are three possible ways we can do this:  1 - If the key is already in a leaf node, and removing it doesn’t cause that leaf node to have too few keys, then simply remove the key to be deleted.  2 - If the key is not in a leaf then it is guaranteed (by the nature of a B-tree) that its predecessor or successor will be in a leaf -- in this case we can delete the key and promote the predecessor or successor key to the non-leaf deleted key’s position. B-Trees 44
  • 45. Analysis of B-Trees  The maximum number of items in a B-tree of order m and height h: root m – 1 level 1 m(m – 1) level 2 m2(m – 1) . . . level h mh(m – 1)  So, the total number of items is (1 + m + m2 + m3 + … + mh)(m – 1) = [(mh+1 – 1)/ (m – 1)] (m – 1) = mh+1 – 1  When m = 5 and h = 2 this gives 53 – 1 = 124 B-Trees 45
  • 46. Static Hashing: In static hashing, when a search-key value is provided, the hash function always computes the same address. For example, if mod-4 hash function is used, then it shall generate only 5 values. The output address shall always be same for that function. The number of buckets provided remains unchanged at all times Operation When a record is required to be entered using static hash, the hash function h computes the bucket address for search key K, where the record will be stored.
  • 47. Bucket address = h(K) Search − When a record needs to be retrieved, the same hash function can be used to retrieve the address of the bucket where the data is stored. Delete − This is simply a search followed by a deletion operation.
  • 48.
  • 49. Dynamic Hashing The problem with static hashing is that it does not expand or shrink dynamically as the size of the database grows or shrinks. Dynamic hashing provides a mechanism in which data buckets are added and removed dynamically and ondemand. Dynamic hashing is also known as extended hashing. Hash function, in dynamic hashing, is made to produce a large number of values and only a few are used initially.
  • 50.
  • 51. Multiple-Key Access Use multiple indices for certain types of queries. Example: select ID from instructor where dept_name = “Finance” and salary = 80000 Possible strategies for processing query using indices on single attributes:
  • 52. Multiple Key Access 1. Use index on dept_name to find instructors with department name Finance; test salary = 80000 2. Use index on salary to find instructors with a salary of $80000; test dept_name = “Finance”. 3. Use dept_name index to find pointers to all records pertaining to the “Finance” department. Similarly use index on salary. Take intersection of both sets of pointers obtained.