SlideShare a Scribd company logo
1 of 29
Indexing Structures
9-Apr-24 Christalin Nelson | SOCS
1 of 29
At a Glance
• Introduction
• Types of Single-Level Ordered Indexes
– Primary Indexes, Clustering Indexes, Secondary Indexes
• Multilevel Indexes
• Dynamic Multilevel Indexes Using B-Trees and B+ Trees
• Indexes on Multiple Keys
9-Apr-24
Christalin Nelson | SOCS
2 of 29
9-Apr-24
Introduction
• Assume data file already exists with some primary organization, additional
secondary organizations can be introduced to speed up retrieval in response to
search conditions involving index fields
– i.e. Auxiliary access structures called Indexes provide secondary access paths
• Variety of indexes can be constructed on the same file
– Multiple indexes on different fields (OR) Indexes on multiple fields
• Searching record(s)
– The index is searched  Pointers to one or more disk blocks --> Required records are
located
• Prevalent types of indexes based on
– Ordered files (single-level indexes), Tree data structures (multilevel indexes, B+-trees),
Hashing or other search data structures, Indexes that are vectors of bits (Bitmap
indexes)
Christalin Nelson | SOCS
3 of 29
9-Apr-24
Single-Level Ordered Indexes
• Ordered Indexes are synonymous with Book index
– Index
• An access structure constructed with values of indexing field & list of pointers to all disk blocks
that contain records with that field value
– Ordered Indexes
• Values in the access structure are ordered. Hence, Binary search can be done on index.
• Types
– Primary index is specified on ordering key field of ordered data file (Indexed Sequential file)
– Clustered index is specified on ordering field of ordered data file (Clustered file)
– Secondary index is specified on any non-ordering field of a file
– Note: A data file can have
• At most one Primary index or Clustered index (not both)
• Several Secondary indexes in addition to its primary access method
Christalin Nelson | SOCS
4 of 29
9-Apr-24
Primary Indexes (1/4)
• The data file (Indexed-Sequential File) is ordered based on key field (Indexing field)
• An access structure (ordered file) whose records are of fixed length with 2 fields
– ith Index entry (or index record) of Index file is given as <K(i), P(i)>
• 1st field: Same data type as ordering key field (Primary key of data file)
• 2nd field: Pointer to first record in Disk block (block address)
– Block Anchor or Anchor Record of Block: 1st record in each block of the Data file
– Total no. of entries in Index file = No. of Disk blocks in ordered Data file
– Binary search is performed on Index file. Search requires fewer block accesses (log2bi + 1).
• Indexes can also be characterized as dense or sparse
– Dense index: Has 1 index entry for each search key value (i.e. every record) in Data file
– Sparse (or non-Dense) index: Has index entries for only some of the search key values
• Primary index is a Sparse index
– Size of Index file (bi blocks) < Size of Data file (b blocks)
Christalin Nelson | SOCS
5 of 29
9-Apr-24
Christalin Nelson | SOCS
Assumption: Each value of Name is unique
Size of Index file (bi blocks) < Size of Data file (b blocks)
Binary search is performed on Index file
Search requires fewer block accesses (log2bi + 1)
6 of 29
9-Apr-24
Primary Indexes (3/4)
• Example: An ordered data file 30,000 unspanned and fixed-length records, each of length R = 100B, and
Ordering key field of size V = 9B, is stored on a disk with block size B = 1024B. If block pointer of size P =
6B is used to construct a primary index for the file, prove that total block access using Index File is
comparatively lesser.
– For Data File
• Blocking factor = bfr = ⎣(B/R)⎦ = ⎣(1024/100)⎦ = 10 records per block
• No. of records = r = 30000
• No. of blocks needed = b = ⎡(r/bfr)⎤ = ⎡(30000/10)⎤ = 3000 blocks
• No. of block accesses that Binary search would need = ⎡log2b⎤= ⎡(log23000)⎤ = 12 block accesses
– For Index File
• Blocking factor = bfri = ⎣(B/Ri)⎦ = ⎣(1024/15)⎦ = 68 entries per block.
• No. of index records = ri = No. of blocks in the data file = 3000
• No. of blocks needed = bi = ⎡(ri/bfri)⎤ = ⎡(3000/68)⎤ = 45 blocks
• No. of block accesses that Binary search would need = ⎡(log2bi)⎤ = ⎡(log245)⎤ = 6 block accesses
• Size of each index entry = Ri = (9 + 6) = 15B
– To search for a record using the index, we need one additional block access to the data file.
• Hence, total block accesses = 6 + 1 = 7 block accesses
– Hence there is an improvement over binary search on the data file, which required 12 disk block accesses.
Christalin Nelson | SOCS
7 of 29
9-Apr-24
Primary Indexes (4/4)
• Insertion/deletion of records is costly
– Improvements
• (1) An unordered overflow file, can reduce this problem
• (2) A linked list of overflow records for each block in the data file
– This is similar to the method of dealing with overflow records with hashing
– Records within each block & its overflow linked list can be sorted to improve retrieval time
• (3) Record deletion is handled using deletion markers
Christalin Nelson | SOCS
8 of 29
9-Apr-24
Clustering Indexes (1/4)
• The data file (Clustered File) is ordered based on non-key field (clustering field)
• Clustering index speeds up retrieval of all records having same value for clustering
field.
• An access structure (ordered file) whose records are of fixed length with 2 fields
– ith Clustered Index entry/record of Clustered Index file is given as <K(i), P(i)>
• 1st field: Same data type as clustering field (non-key field of data file)
• 2nd field: Pointer to first record in Disk block (block address)
– Block Anchor or Anchor Record of Block: 1st record in each block of the Data file
– Total no. of entries in Index file = No. of distinct value of clustering field in Data File
– Binary search is performed on Index file.
• Clustered Index is a Sparse index
– Size of Index file (bi blocks) < Size of Data file (b blocks)
Christalin Nelson | SOCS
9 of 29
9-Apr-24
Christalin Nelson | SOCS
Clustering index on “Dept_number”
ordering non-key field of EMPLOYEE file
10 of 29
9-Apr-24
Christalin Nelson | SOCS
Clustering index with a separate block cluster
for each group of records that share same
value for clustering field
11 of 29
9-Apr-24
Clustering Indexes (4/4)
• Insertion/deletion is costly
– Solution: Reserve a whole block (or a cluster of contiguous blocks) for each value of
clustering field  All records with that value are placed in the block (or block cluster)
• Note
– An index is similar to dynamic hash & directory structures used for extendible hashing
• Both are searched to find a pointer to Data block containing the desired record
• Difference: An index search uses values of the search field itself. Hash directory search uses
the binary hash value calculated by applying hash function to the search field
Christalin Nelson | SOCS
12 of 29
9-Apr-24
Secondary Index (1/4)
• Provides secondary means of accessing a data file for which some primary access
already exists
– Many secondary indexes can be created for the same file. Each represents an
additional means of accessing that file based on some specific field
• Data file records could be ordered, unordered, or hashed
• Indexing field: Non-ordering Candidate key (PK or Secondary key) or non-key field
(can have duplicate values)
• An access structure (ordered file) whose records are of fixed length with 2 fields
– ith Index entry/record of Secondary Index file is given as <K(i), P(i)>
• 1st field: Same data type as Index field
• 2nd field: Block pointer or Record pointer
– Binary search is performed on Index file
Christalin Nelson | SOCS
13 of 29
9-Apr-24
Christalin Nelson | SOCS
Dense Secondary index
(with block pointers)
on a non-ordering key field of file
14 of 29
9-Apr-24
Christalin Nelson | SOCS
Secondary index (with record pointers)
on a non-key field implemented using
one level of indirection so that index entries are of
fixed length and have unique field values
15 of 29
9-Apr-24
Secondary Index (4/4)
• Secondary Index is a Dense index
– One index entry for each record in the data file (Index field is a non-ordering field)
– Needs more storage space & search time
• No block anchors (data file is physically unordered)
• Less search time for an arbitrary record in the Data file
– Linear search on data file is required if the Secondary index does not exist
• Note
– Secondary index provides logical ordering on records by Indexing field
• i.e. Record access in order of secondary index records with indexing field
– Primary & Clustering index assumes that field used for physical ordering of records in
the file is same as Indexing field
Christalin Nelson | SOCS
16 of 29
9-Apr-24
Summary of Single-Level Ordered Indexes
• Note
– Secondary index provides
logical ordering on records
by Indexing field
• i.e. Record access in order
of secondary index records
with indexing field
– Primary & Clustering index
assumes that field used for
physical ordering of records
in file is same as Indexing
field
Christalin Nelson | SOCS
17 of 29
9-Apr-24
Multilevel Index (1/4)
• Data File  Create First(Base)-Level Primary Index File (Data file for 2nd Level) 
Create Second-level Primary Index file
– Multi-level scheme can be used on any index (Primary, Clustering, or secondary) when
1st level index has distinct values for K(i) and fixed-length entries
• Indexing schemes discussed thus far had one ordered index file
– Binary search on Index file with bi blocks required approx. (log2bi) block accesses
• Multilevel index reduces Index file that is searched by bfri (No. of records in block)
– Considering bfri = Fan-out of Multilevel Index (fo), the Index file is split in ‘fo’ ways at
each step
• Hence, search requires approx. (logfobi) block accesses < Binary search (fan-out > 2)
• 2nd level File has one entry for each block of 1st level File  Block anchors are used
Christalin Nelson | SOCS
18 of 29
9-Apr-24
Multilevel Index (2/4)
• Index entries are of same size at all levels
– 2 fields: Index field value & Block address
• Blocking factor of Index Files (bfri) is same at all levels
– 1st level: r1 entries, bfri = fo (fan-out), then 1st level needs ⎡(r1/fo)⎤ blocks
– No. of 2nd level entries = r2 = ⎡(r1/fo)⎤ = No. of blocks in 1st level
– No. of 3rd level entries = r3 = ⎡(r2/fo)⎤ = No. of blocks in 2nd level
– Note: Each level reduces no. of entries at previous level by a factor of fo
• Top index level
– Multilevel Index files can be created until the last level (t) has only 1 block to fit all
entries. Here, No. of Index levels = t = ⎡logfo(r1)⎤
• i.e. 1 ≤ (r1/fot)
– For an index search, ‘t’ disk blocks are accessed
Christalin Nelson | SOCS
19 of 29
9-Apr-24
Christalin Nelson | SOCS
2-level Primary Index
resembling ISAM organization
IBM’s Indexed Sequential Access Method (ISAM)
organization
• Level-1: Cylinder Index
• Level-2: Track Index
• Track is searched sequentially for the desired
record or block
20 of 29
9-Apr-24
Multilevel Index (4/4)
• Multilevel index reduces no. of blocks accessed during record search when index
field value is given
• Note: Index insertion and deletion are inefficient as all Index levels are physically
ordered files
– Solution: Dynamic multilevel index (Implemented with B trees & B+ trees)
Christalin Nelson | SOCS
21 of 29
9-Apr-24
Indexes on Multiple Keys (1/2)
• Considering index field as a combination of attributes
• Example:
– EMPLOYEE file contains attributes Ssn (Key), Age, Street, City, Zip_code, Salary,
Skill_code, Dno
– Query: Find employees in Dno = 4 and age = 59.
• Dno & Age are non-key attributes. Hence, search value would point to multiple records.
• Alternative search strategies
– Dno has an index, but Age does not have  Use index to access records with Dno = 4  Select
records with Age = 59
– Age has an index, but Dno does not have  Use index to access records with Age = 59  Select
records that satisfy Dno = 4
– Create Indexes for Dno and Age  Use index to find 2 sets (with Dno=4 and Age=59 respectively)
 Intersection of these sets would satisfy both conditions
• Note: The above strategies are not efficient unless Index field is a composite key
Christalin Nelson | SOCS
22 of 29
9-Apr-24
Indexes on Multiple Keys (2/2)
• Types
– Ordered Index on Multiple Attributes
– Partitioned Hashing
– Grid Files
Christalin Nelson | SOCS
23 of 29
9-Apr-24
Ordered Index on Multiple Attributes
• Index on a composite key of n attributes with lexicographic ordering
– If an index is created on attributes <A1, A2, ..., An>  Search key values are tuples
with n values: <v1, v2, ..., vn>  Lexicographic ordering of these tuple values
establishes an order on this composite search key
• Lexicographic ordering works similarly to ordering of character strings
• Example in Slide-22:
– Create an index on a composite search key field <Dno, Age>
– If the Search key is a pair of values <4, 59>
• All department keys for Dno=3 precede those for Dno=4
– Thus <3, n> precedes <4,m> for any values of m and n
• Ascending key order for keys with Dno = 4 would be <4, 18>, <4, 19>, <4, 20>, …, etc.
Christalin Nelson | SOCS
24 of 29
9-Apr-24
Partitioned Hashing (1/2)
• Extension of static External hashing that allows access to multiple keys
• Key consists of n components  Hash function produces a result with ‘n’ separate
hash addresses  Bucket address = Concatenation of 'n' addresses
– Search for required composite search key  Look up appropriate buckets that match
parts of address
• Advantages
– Suitable for equality comparisons (Range queries are not supported)
– Can easily extend to any number of attributes
• The bucket addresses can be designed so that high-order bits in the addresses correspond
to more frequently accessed attributes.
– No separate access structure needs to be maintained for the individual attributes
Christalin Nelson | SOCS
25 of 29
9-Apr-24
Partitioned Hashing (2/2)
• Example in Slide-22
– Consider composite search key <Dno, Age>. If Dno and Age are hashed into a 3-bit and
5-bit address respectively, we get an 8-bit bucket address.
• <4, 59> can be stored in 10010101
– Assume: Dno = 4 has a hash address ‘100’, Age = 59 has hash address ‘10101’
– Search for <4, 59> in Bucket address 10010101
• Search for all employees with Age = 59  All 8 buckets (in this example) will be searched
– i.e. ‘00010101’, ‘00110101’, ..., etc.
Christalin Nelson | SOCS
26 of 29
9-Apr-24
Grid Files (1/2)
• For a relationship, construct a grid array based on search attributes with one linear
scale (or dimension) for each search attribute
– For ‘n’ search keys, the grid array would have n dimensions
• Achieve uniform distribution of each attribute
• Each cell in the grid has a bucket address to store corresponding records
• Advantages
– Useful for range queries
– Can be applied to any number of search keys
– Reduction in time for multiple key access
• Drawbacks
– Space overhead in terms of grid array structure
– Frequent file reorganization adds to maintenance cost with dynamic files
Christalin Nelson | SOCS
27 of 29
9-Apr-24
Grid Files (2/2)
• Example: Employee File
– To access a file on two keys, say Dno and Age  Construct grid array as per the Linear
scales of Dno and Age
– In the Grid array with 36 cells
• Dno = 4 and Age = 59 corresponds to the data in cell (1, 5)
• Dno ≤ 5 and Age > 40 corresponds to data in certain bucket
Christalin Nelson | SOCS
28 of 29
Thank You
Christalin Nelson | SOCS
9-Apr-24
9-Apr-24
29 of 29

More Related Content

What's hot

Relational_Algebra_Calculus Operations.pdf
Relational_Algebra_Calculus Operations.pdfRelational_Algebra_Calculus Operations.pdf
Relational_Algebra_Calculus Operations.pdf
Christalin Nelson
 
Using SQL Queries to Insert, Update, Delete, and View Data.ppt
Using SQL Queries to Insert, Update, Delete, and View Data.pptUsing SQL Queries to Insert, Update, Delete, and View Data.ppt
Using SQL Queries to Insert, Update, Delete, and View Data.ppt
MohammedJifar1
 
Data mining query languages
Data mining query languagesData mining query languages
Data mining query languages
Marcy Morales
 

What's hot (20)

Relational_Algebra_Calculus Operations.pdf
Relational_Algebra_Calculus Operations.pdfRelational_Algebra_Calculus Operations.pdf
Relational_Algebra_Calculus Operations.pdf
 
Sql commands
Sql commandsSql commands
Sql commands
 
Database overview
Database overviewDatabase overview
Database overview
 
IBM DB2 for z/OS Administration Basics
IBM DB2 for z/OS Administration BasicsIBM DB2 for z/OS Administration Basics
IBM DB2 for z/OS Administration Basics
 
Transaction & Concurrency Control
Transaction & Concurrency ControlTransaction & Concurrency Control
Transaction & Concurrency Control
 
Troubleshooting Linux Kernel Modules And Device Drivers
Troubleshooting Linux Kernel Modules And Device DriversTroubleshooting Linux Kernel Modules And Device Drivers
Troubleshooting Linux Kernel Modules And Device Drivers
 
DB2 10 & 11 for z/OS System Performance Monitoring and Optimisation
DB2 10 & 11 for z/OS System Performance Monitoring and OptimisationDB2 10 & 11 for z/OS System Performance Monitoring and Optimisation
DB2 10 & 11 for z/OS System Performance Monitoring and Optimisation
 
Transaction management and concurrency control
Transaction management and concurrency controlTransaction management and concurrency control
Transaction management and concurrency control
 
Data warehouse 21 snowflake schema
Data warehouse 21 snowflake schemaData warehouse 21 snowflake schema
Data warehouse 21 snowflake schema
 
Database Management Systems
Database Management SystemsDatabase Management Systems
Database Management Systems
 
Using SQL Queries to Insert, Update, Delete, and View Data.ppt
Using SQL Queries to Insert, Update, Delete, and View Data.pptUsing SQL Queries to Insert, Update, Delete, and View Data.ppt
Using SQL Queries to Insert, Update, Delete, and View Data.ppt
 
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
 
OER Unit 4 Virtual Private Database
OER Unit 4 Virtual Private DatabaseOER Unit 4 Virtual Private Database
OER Unit 4 Virtual Private Database
 
IBM Utilities
IBM UtilitiesIBM Utilities
IBM Utilities
 
DB2 utilities
DB2 utilitiesDB2 utilities
DB2 utilities
 
2 db2 instance creation
2 db2 instance creation2 db2 instance creation
2 db2 instance creation
 
Data Mining And Data Warehousing Laboratory File Manual
Data Mining And Data Warehousing Laboratory File ManualData Mining And Data Warehousing Laboratory File Manual
Data Mining And Data Warehousing Laboratory File Manual
 
Architecture of exadata database machine – Part II
Architecture of exadata database machine – Part IIArchitecture of exadata database machine – Part II
Architecture of exadata database machine – Part II
 
12. oracle database architecture
12. oracle database architecture12. oracle database architecture
12. oracle database architecture
 
Data mining query languages
Data mining query languagesData mining query languages
Data mining query languages
 

Similar to Indexing Structures in Database Management system.pdf

Indexing and hashing
Indexing and hashingIndexing and hashing
Indexing and hashing
Jeet Poria
 
chapter5-file system implementation.ppt
chapter5-file system implementation.pptchapter5-file system implementation.ppt
chapter5-file system implementation.ppt
BUSHRASHAIKH804312
 

Similar to Indexing Structures in Database Management system.pdf (20)

files,indexing,hashing,linear and non linear hashing
files,indexing,hashing,linear and non linear hashingfiles,indexing,hashing,linear and non linear hashing
files,indexing,hashing,linear and non linear hashing
 
File organization 1
File organization 1File organization 1
File organization 1
 
Database Management System-Module-IV(part-1).pptx
Database Management System-Module-IV(part-1).pptxDatabase Management System-Module-IV(part-1).pptx
Database Management System-Module-IV(part-1).pptx
 
Indexing and Hashing
Indexing and HashingIndexing and Hashing
Indexing and Hashing
 
5 data storage_and_indexing
5 data storage_and_indexing5 data storage_and_indexing
5 data storage_and_indexing
 
Database management system File organization-sk.pdf
Database management system File organization-sk.pdfDatabase management system File organization-sk.pdf
Database management system File organization-sk.pdf
 
OS Unit5.pptx
OS Unit5.pptxOS Unit5.pptx
OS Unit5.pptx
 
Unit 08 dbms
Unit 08 dbmsUnit 08 dbms
Unit 08 dbms
 
File organization
File organizationFile organization
File organization
 
Mba admission in india
Mba admission in indiaMba admission in india
Mba admission in india
 
5. indexing
5. indexing5. indexing
5. indexing
 
16119 - Get to Know Your Data Sets (1).pdf
16119 - Get to Know Your Data Sets (1).pdf16119 - Get to Know Your Data Sets (1).pdf
16119 - Get to Know Your Data Sets (1).pdf
 
Indexing and hashing
Indexing and hashingIndexing and hashing
Indexing and hashing
 
File Organization
File OrganizationFile Organization
File Organization
 
Allocation and free space management
Allocation and free space managementAllocation and free space management
Allocation and free space management
 
Indexing and hashing
Indexing and hashingIndexing and hashing
Indexing and hashing
 
Storage struct
Storage structStorage struct
Storage struct
 
chapter5-file system implementation.ppt
chapter5-file system implementation.pptchapter5-file system implementation.ppt
chapter5-file system implementation.ppt
 
Adbms 22 dynamic multi level index using b and b+ tree
Adbms 22 dynamic multi level index using b  and b+ treeAdbms 22 dynamic multi level index using b  and b+ tree
Adbms 22 dynamic multi level index using b and b+ tree
 
Data base
Data baseData base
Data base
 

More from Christalin Nelson

More from Christalin Nelson (19)

Packages and Subpackages in Java
Packages and Subpackages in JavaPackages and Subpackages in Java
Packages and Subpackages in Java
 
Bitwise complement operator
Bitwise complement operatorBitwise complement operator
Bitwise complement operator
 
Advanced Data Structures - Vol.2
Advanced Data Structures - Vol.2Advanced Data Structures - Vol.2
Advanced Data Structures - Vol.2
 
Deadlocks
DeadlocksDeadlocks
Deadlocks
 
CPU Scheduling
CPU SchedulingCPU Scheduling
CPU Scheduling
 
Process Synchronization
Process SynchronizationProcess Synchronization
Process Synchronization
 
Process Management
Process ManagementProcess Management
Process Management
 
Applications of Stack
Applications of StackApplications of Stack
Applications of Stack
 
Storage system architecture
Storage system architectureStorage system architecture
Storage system architecture
 
Data Storage and Information Management
Data Storage and Information ManagementData Storage and Information Management
Data Storage and Information Management
 
Application Middleware Overview
Application Middleware OverviewApplication Middleware Overview
Application Middleware Overview
 
Network security
Network securityNetwork security
Network security
 
Directory services
Directory servicesDirectory services
Directory services
 
System overview
System overviewSystem overview
System overview
 
Storage overview
Storage overviewStorage overview
Storage overview
 
Computer Fundamentals-2
Computer Fundamentals-2Computer Fundamentals-2
Computer Fundamentals-2
 
Computer Fundamentals - 1
Computer Fundamentals - 1Computer Fundamentals - 1
Computer Fundamentals - 1
 
Advanced data structures vol. 1
Advanced data structures   vol. 1Advanced data structures   vol. 1
Advanced data structures vol. 1
 
Programming in c++
Programming in c++Programming in c++
Programming in c++
 

Recently uploaded

會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
中 央社
 
The basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxThe basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptx
heathfieldcps1
 
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
中 央社
 
Financial Accounting IFRS, 3rd Edition-dikompresi.pdf
Financial Accounting IFRS, 3rd Edition-dikompresi.pdfFinancial Accounting IFRS, 3rd Edition-dikompresi.pdf
Financial Accounting IFRS, 3rd Edition-dikompresi.pdf
MinawBelay
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project research
CaitlinCummins3
 

Recently uploaded (20)

An overview of the various scriptures in Hinduism
An overview of the various scriptures in HinduismAn overview of the various scriptures in Hinduism
An overview of the various scriptures in Hinduism
 
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
 
Navigating the Misinformation Minefield: The Role of Higher Education in the ...
Navigating the Misinformation Minefield: The Role of Higher Education in the ...Navigating the Misinformation Minefield: The Role of Higher Education in the ...
Navigating the Misinformation Minefield: The Role of Higher Education in the ...
 
Software testing for project report .pdf
Software testing for project report .pdfSoftware testing for project report .pdf
Software testing for project report .pdf
 
Features of Video Calls in the Discuss Module in Odoo 17
Features of Video Calls in the Discuss Module in Odoo 17Features of Video Calls in the Discuss Module in Odoo 17
Features of Video Calls in the Discuss Module in Odoo 17
 
Championnat de France de Tennis de table/
Championnat de France de Tennis de table/Championnat de France de Tennis de table/
Championnat de France de Tennis de table/
 
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
 
philosophy and it's principles based on the life
philosophy and it's principles based on the lifephilosophy and it's principles based on the life
philosophy and it's principles based on the life
 
The basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxThe basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptx
 
PSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptxPSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptx
 
The Ball Poem- John Berryman_20240518_001617_0000.pptx
The Ball Poem- John Berryman_20240518_001617_0000.pptxThe Ball Poem- John Berryman_20240518_001617_0000.pptx
The Ball Poem- John Berryman_20240518_001617_0000.pptx
 
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
 
Capitol Tech Univ Doctoral Presentation -May 2024
Capitol Tech Univ Doctoral Presentation -May 2024Capitol Tech Univ Doctoral Presentation -May 2024
Capitol Tech Univ Doctoral Presentation -May 2024
 
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
 
An Overview of the Odoo 17 Discuss App.pptx
An Overview of the Odoo 17 Discuss App.pptxAn Overview of the Odoo 17 Discuss App.pptx
An Overview of the Odoo 17 Discuss App.pptx
 
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
 
Financial Accounting IFRS, 3rd Edition-dikompresi.pdf
Financial Accounting IFRS, 3rd Edition-dikompresi.pdfFinancial Accounting IFRS, 3rd Edition-dikompresi.pdf
Financial Accounting IFRS, 3rd Edition-dikompresi.pdf
 
Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project research
 
Dementia (Alzheimer & vasular dementia).
Dementia (Alzheimer & vasular dementia).Dementia (Alzheimer & vasular dementia).
Dementia (Alzheimer & vasular dementia).
 

Indexing Structures in Database Management system.pdf

  • 2. At a Glance • Introduction • Types of Single-Level Ordered Indexes – Primary Indexes, Clustering Indexes, Secondary Indexes • Multilevel Indexes • Dynamic Multilevel Indexes Using B-Trees and B+ Trees • Indexes on Multiple Keys 9-Apr-24 Christalin Nelson | SOCS 2 of 29
  • 3. 9-Apr-24 Introduction • Assume data file already exists with some primary organization, additional secondary organizations can be introduced to speed up retrieval in response to search conditions involving index fields – i.e. Auxiliary access structures called Indexes provide secondary access paths • Variety of indexes can be constructed on the same file – Multiple indexes on different fields (OR) Indexes on multiple fields • Searching record(s) – The index is searched  Pointers to one or more disk blocks --> Required records are located • Prevalent types of indexes based on – Ordered files (single-level indexes), Tree data structures (multilevel indexes, B+-trees), Hashing or other search data structures, Indexes that are vectors of bits (Bitmap indexes) Christalin Nelson | SOCS 3 of 29
  • 4. 9-Apr-24 Single-Level Ordered Indexes • Ordered Indexes are synonymous with Book index – Index • An access structure constructed with values of indexing field & list of pointers to all disk blocks that contain records with that field value – Ordered Indexes • Values in the access structure are ordered. Hence, Binary search can be done on index. • Types – Primary index is specified on ordering key field of ordered data file (Indexed Sequential file) – Clustered index is specified on ordering field of ordered data file (Clustered file) – Secondary index is specified on any non-ordering field of a file – Note: A data file can have • At most one Primary index or Clustered index (not both) • Several Secondary indexes in addition to its primary access method Christalin Nelson | SOCS 4 of 29
  • 5. 9-Apr-24 Primary Indexes (1/4) • The data file (Indexed-Sequential File) is ordered based on key field (Indexing field) • An access structure (ordered file) whose records are of fixed length with 2 fields – ith Index entry (or index record) of Index file is given as <K(i), P(i)> • 1st field: Same data type as ordering key field (Primary key of data file) • 2nd field: Pointer to first record in Disk block (block address) – Block Anchor or Anchor Record of Block: 1st record in each block of the Data file – Total no. of entries in Index file = No. of Disk blocks in ordered Data file – Binary search is performed on Index file. Search requires fewer block accesses (log2bi + 1). • Indexes can also be characterized as dense or sparse – Dense index: Has 1 index entry for each search key value (i.e. every record) in Data file – Sparse (or non-Dense) index: Has index entries for only some of the search key values • Primary index is a Sparse index – Size of Index file (bi blocks) < Size of Data file (b blocks) Christalin Nelson | SOCS 5 of 29
  • 6. 9-Apr-24 Christalin Nelson | SOCS Assumption: Each value of Name is unique Size of Index file (bi blocks) < Size of Data file (b blocks) Binary search is performed on Index file Search requires fewer block accesses (log2bi + 1) 6 of 29
  • 7. 9-Apr-24 Primary Indexes (3/4) • Example: An ordered data file 30,000 unspanned and fixed-length records, each of length R = 100B, and Ordering key field of size V = 9B, is stored on a disk with block size B = 1024B. If block pointer of size P = 6B is used to construct a primary index for the file, prove that total block access using Index File is comparatively lesser. – For Data File • Blocking factor = bfr = ⎣(B/R)⎦ = ⎣(1024/100)⎦ = 10 records per block • No. of records = r = 30000 • No. of blocks needed = b = ⎡(r/bfr)⎤ = ⎡(30000/10)⎤ = 3000 blocks • No. of block accesses that Binary search would need = ⎡log2b⎤= ⎡(log23000)⎤ = 12 block accesses – For Index File • Blocking factor = bfri = ⎣(B/Ri)⎦ = ⎣(1024/15)⎦ = 68 entries per block. • No. of index records = ri = No. of blocks in the data file = 3000 • No. of blocks needed = bi = ⎡(ri/bfri)⎤ = ⎡(3000/68)⎤ = 45 blocks • No. of block accesses that Binary search would need = ⎡(log2bi)⎤ = ⎡(log245)⎤ = 6 block accesses • Size of each index entry = Ri = (9 + 6) = 15B – To search for a record using the index, we need one additional block access to the data file. • Hence, total block accesses = 6 + 1 = 7 block accesses – Hence there is an improvement over binary search on the data file, which required 12 disk block accesses. Christalin Nelson | SOCS 7 of 29
  • 8. 9-Apr-24 Primary Indexes (4/4) • Insertion/deletion of records is costly – Improvements • (1) An unordered overflow file, can reduce this problem • (2) A linked list of overflow records for each block in the data file – This is similar to the method of dealing with overflow records with hashing – Records within each block & its overflow linked list can be sorted to improve retrieval time • (3) Record deletion is handled using deletion markers Christalin Nelson | SOCS 8 of 29
  • 9. 9-Apr-24 Clustering Indexes (1/4) • The data file (Clustered File) is ordered based on non-key field (clustering field) • Clustering index speeds up retrieval of all records having same value for clustering field. • An access structure (ordered file) whose records are of fixed length with 2 fields – ith Clustered Index entry/record of Clustered Index file is given as <K(i), P(i)> • 1st field: Same data type as clustering field (non-key field of data file) • 2nd field: Pointer to first record in Disk block (block address) – Block Anchor or Anchor Record of Block: 1st record in each block of the Data file – Total no. of entries in Index file = No. of distinct value of clustering field in Data File – Binary search is performed on Index file. • Clustered Index is a Sparse index – Size of Index file (bi blocks) < Size of Data file (b blocks) Christalin Nelson | SOCS 9 of 29
  • 10. 9-Apr-24 Christalin Nelson | SOCS Clustering index on “Dept_number” ordering non-key field of EMPLOYEE file 10 of 29
  • 11. 9-Apr-24 Christalin Nelson | SOCS Clustering index with a separate block cluster for each group of records that share same value for clustering field 11 of 29
  • 12. 9-Apr-24 Clustering Indexes (4/4) • Insertion/deletion is costly – Solution: Reserve a whole block (or a cluster of contiguous blocks) for each value of clustering field  All records with that value are placed in the block (or block cluster) • Note – An index is similar to dynamic hash & directory structures used for extendible hashing • Both are searched to find a pointer to Data block containing the desired record • Difference: An index search uses values of the search field itself. Hash directory search uses the binary hash value calculated by applying hash function to the search field Christalin Nelson | SOCS 12 of 29
  • 13. 9-Apr-24 Secondary Index (1/4) • Provides secondary means of accessing a data file for which some primary access already exists – Many secondary indexes can be created for the same file. Each represents an additional means of accessing that file based on some specific field • Data file records could be ordered, unordered, or hashed • Indexing field: Non-ordering Candidate key (PK or Secondary key) or non-key field (can have duplicate values) • An access structure (ordered file) whose records are of fixed length with 2 fields – ith Index entry/record of Secondary Index file is given as <K(i), P(i)> • 1st field: Same data type as Index field • 2nd field: Block pointer or Record pointer – Binary search is performed on Index file Christalin Nelson | SOCS 13 of 29
  • 14. 9-Apr-24 Christalin Nelson | SOCS Dense Secondary index (with block pointers) on a non-ordering key field of file 14 of 29
  • 15. 9-Apr-24 Christalin Nelson | SOCS Secondary index (with record pointers) on a non-key field implemented using one level of indirection so that index entries are of fixed length and have unique field values 15 of 29
  • 16. 9-Apr-24 Secondary Index (4/4) • Secondary Index is a Dense index – One index entry for each record in the data file (Index field is a non-ordering field) – Needs more storage space & search time • No block anchors (data file is physically unordered) • Less search time for an arbitrary record in the Data file – Linear search on data file is required if the Secondary index does not exist • Note – Secondary index provides logical ordering on records by Indexing field • i.e. Record access in order of secondary index records with indexing field – Primary & Clustering index assumes that field used for physical ordering of records in the file is same as Indexing field Christalin Nelson | SOCS 16 of 29
  • 17. 9-Apr-24 Summary of Single-Level Ordered Indexes • Note – Secondary index provides logical ordering on records by Indexing field • i.e. Record access in order of secondary index records with indexing field – Primary & Clustering index assumes that field used for physical ordering of records in file is same as Indexing field Christalin Nelson | SOCS 17 of 29
  • 18. 9-Apr-24 Multilevel Index (1/4) • Data File  Create First(Base)-Level Primary Index File (Data file for 2nd Level)  Create Second-level Primary Index file – Multi-level scheme can be used on any index (Primary, Clustering, or secondary) when 1st level index has distinct values for K(i) and fixed-length entries • Indexing schemes discussed thus far had one ordered index file – Binary search on Index file with bi blocks required approx. (log2bi) block accesses • Multilevel index reduces Index file that is searched by bfri (No. of records in block) – Considering bfri = Fan-out of Multilevel Index (fo), the Index file is split in ‘fo’ ways at each step • Hence, search requires approx. (logfobi) block accesses < Binary search (fan-out > 2) • 2nd level File has one entry for each block of 1st level File  Block anchors are used Christalin Nelson | SOCS 18 of 29
  • 19. 9-Apr-24 Multilevel Index (2/4) • Index entries are of same size at all levels – 2 fields: Index field value & Block address • Blocking factor of Index Files (bfri) is same at all levels – 1st level: r1 entries, bfri = fo (fan-out), then 1st level needs ⎡(r1/fo)⎤ blocks – No. of 2nd level entries = r2 = ⎡(r1/fo)⎤ = No. of blocks in 1st level – No. of 3rd level entries = r3 = ⎡(r2/fo)⎤ = No. of blocks in 2nd level – Note: Each level reduces no. of entries at previous level by a factor of fo • Top index level – Multilevel Index files can be created until the last level (t) has only 1 block to fit all entries. Here, No. of Index levels = t = ⎡logfo(r1)⎤ • i.e. 1 ≤ (r1/fot) – For an index search, ‘t’ disk blocks are accessed Christalin Nelson | SOCS 19 of 29
  • 20. 9-Apr-24 Christalin Nelson | SOCS 2-level Primary Index resembling ISAM organization IBM’s Indexed Sequential Access Method (ISAM) organization • Level-1: Cylinder Index • Level-2: Track Index • Track is searched sequentially for the desired record or block 20 of 29
  • 21. 9-Apr-24 Multilevel Index (4/4) • Multilevel index reduces no. of blocks accessed during record search when index field value is given • Note: Index insertion and deletion are inefficient as all Index levels are physically ordered files – Solution: Dynamic multilevel index (Implemented with B trees & B+ trees) Christalin Nelson | SOCS 21 of 29
  • 22. 9-Apr-24 Indexes on Multiple Keys (1/2) • Considering index field as a combination of attributes • Example: – EMPLOYEE file contains attributes Ssn (Key), Age, Street, City, Zip_code, Salary, Skill_code, Dno – Query: Find employees in Dno = 4 and age = 59. • Dno & Age are non-key attributes. Hence, search value would point to multiple records. • Alternative search strategies – Dno has an index, but Age does not have  Use index to access records with Dno = 4  Select records with Age = 59 – Age has an index, but Dno does not have  Use index to access records with Age = 59  Select records that satisfy Dno = 4 – Create Indexes for Dno and Age  Use index to find 2 sets (with Dno=4 and Age=59 respectively)  Intersection of these sets would satisfy both conditions • Note: The above strategies are not efficient unless Index field is a composite key Christalin Nelson | SOCS 22 of 29
  • 23. 9-Apr-24 Indexes on Multiple Keys (2/2) • Types – Ordered Index on Multiple Attributes – Partitioned Hashing – Grid Files Christalin Nelson | SOCS 23 of 29
  • 24. 9-Apr-24 Ordered Index on Multiple Attributes • Index on a composite key of n attributes with lexicographic ordering – If an index is created on attributes <A1, A2, ..., An>  Search key values are tuples with n values: <v1, v2, ..., vn>  Lexicographic ordering of these tuple values establishes an order on this composite search key • Lexicographic ordering works similarly to ordering of character strings • Example in Slide-22: – Create an index on a composite search key field <Dno, Age> – If the Search key is a pair of values <4, 59> • All department keys for Dno=3 precede those for Dno=4 – Thus <3, n> precedes <4,m> for any values of m and n • Ascending key order for keys with Dno = 4 would be <4, 18>, <4, 19>, <4, 20>, …, etc. Christalin Nelson | SOCS 24 of 29
  • 25. 9-Apr-24 Partitioned Hashing (1/2) • Extension of static External hashing that allows access to multiple keys • Key consists of n components  Hash function produces a result with ‘n’ separate hash addresses  Bucket address = Concatenation of 'n' addresses – Search for required composite search key  Look up appropriate buckets that match parts of address • Advantages – Suitable for equality comparisons (Range queries are not supported) – Can easily extend to any number of attributes • The bucket addresses can be designed so that high-order bits in the addresses correspond to more frequently accessed attributes. – No separate access structure needs to be maintained for the individual attributes Christalin Nelson | SOCS 25 of 29
  • 26. 9-Apr-24 Partitioned Hashing (2/2) • Example in Slide-22 – Consider composite search key <Dno, Age>. If Dno and Age are hashed into a 3-bit and 5-bit address respectively, we get an 8-bit bucket address. • <4, 59> can be stored in 10010101 – Assume: Dno = 4 has a hash address ‘100’, Age = 59 has hash address ‘10101’ – Search for <4, 59> in Bucket address 10010101 • Search for all employees with Age = 59  All 8 buckets (in this example) will be searched – i.e. ‘00010101’, ‘00110101’, ..., etc. Christalin Nelson | SOCS 26 of 29
  • 27. 9-Apr-24 Grid Files (1/2) • For a relationship, construct a grid array based on search attributes with one linear scale (or dimension) for each search attribute – For ‘n’ search keys, the grid array would have n dimensions • Achieve uniform distribution of each attribute • Each cell in the grid has a bucket address to store corresponding records • Advantages – Useful for range queries – Can be applied to any number of search keys – Reduction in time for multiple key access • Drawbacks – Space overhead in terms of grid array structure – Frequent file reorganization adds to maintenance cost with dynamic files Christalin Nelson | SOCS 27 of 29
  • 28. 9-Apr-24 Grid Files (2/2) • Example: Employee File – To access a file on two keys, say Dno and Age  Construct grid array as per the Linear scales of Dno and Age – In the Grid array with 36 cells • Dno = 4 and Age = 59 corresponds to the data in cell (1, 5) • Dno ≤ 5 and Age > 40 corresponds to data in certain bucket Christalin Nelson | SOCS 28 of 29
  • 29. Thank You Christalin Nelson | SOCS 9-Apr-24 9-Apr-24 29 of 29