SlideShare a Scribd company logo
1 of 38
Department of Information Technology 1Data base Technologies (ITB4201)
Physical Database Design: Indexing
Dr. C.V. Suresh Babu
Professor
Department of IT
Hindustan Institute of Science & Technology
Department of Information Technology 2Data base Technologies (ITB4201)
Action Plan
• Physical Database Design Phase
• Steps in Physical Database Design
• Files and Records
• Index Classification
• Hash-Based Indexes
• Tree Indexes
• B+ Tree
• Ordered Indexes
• Primary Indexes
• Clustering Indexes
• Secondary Indexes
• Indexed Sequential
• Quiz
Department of Information Technology 3Data base Technologies (ITB4201)
Physical Database Design Phase
• Inputs into the Physical Design Phase
– Logical (implementation) model
– Documentation/Definitions
– DBMS Characteristics
– Response time requirements
– Security, Backup, Recovery, Retention, Integrity requirements
• Logical Design Phase focuses on What; Physical Design
Phases focuses on How.
Department of Information Technology 4Data base Technologies (ITB4201)
Physical Database Design Phase
• Outputs:
– Produces a Description of the Implementation of the Database on
Secondary Storage.
– Describes the storage structures and access methods used to achieve
efficient access to the data.
• Focus on Data Processing Efficiency
Department of Information Technology 5Data base Technologies (ITB4201)
Steps in Physical Database Design
• Translate Logical (Implementation) Data Model for Target
DBMS
– Design base relations and constraints
Department of Information Technology 6Data base Technologies (ITB4201)
Steps in Physical Database Design
• Design Physical Representation
– Analyze transactions
– Choose file organizations
– Choose secondary indexes
– Introduction of controlled redundancy (Denormalization)
– Estimate disk space requirements
Department of Information Technology 7Data base Technologies (ITB4201)
Files and Records
• Records
– Contain fields (fixed and variable length fields)
– Fixed-length
– Variable-length
– Spanned
– Unspanned
Department of Information Technology 8Data base Technologies (ITB4201)
Files and Records
• Block (page): The amount of data read or written in one I/O
operation.
• Blocking Factor: The number of physical records per block.
• File Organization: How the physical records in a file are
arranged on the disk.
• Access Method: How the data can be retrieved based on the
file organization.
Department of Information Technology 9Data base Technologies (ITB4201)
Index Classification
• Primary vs. secondary: If search key contains primary key, then called primary index.
– Unique index: Search key contains a candidate key.
• Clustered vs. unclustered: If order of data records is the same as, or `close to’, order
of data entries, then called clustered index.
– A file can be clustered on at most one search key.
– Cost of retrieving data records through index varies greatly based on whether index is clustered
or not!
Department of Information Technology 10Data base Technologies (ITB4201)
Hash-Based Indexes
• Hash function calculates the address of the page on which the
record is stored.
• The field that’s used in the hash function is called the hash field.
– Called a hash key if the hash field is also a key field.
• Good for equality searches
• Not good for range searches
Department of Information Technology 11Data base Technologies (ITB4201)
Hash File
Department of Information Technology 12Data base Technologies (ITB4201)
Tree Indexes
• Many DBMS use tree data structures to hold data/indexes.
• Tree contains a hierarchy of nodes
• Root node has no parent
• Other nodes have one parent
• Nodes can (but don’t have to) have child nodes.
• A node with no children is called a leaf node.
• Balanced tree: depth (number of levels) between root and each leaf node is the
same.
Department of Information Technology 13Data base Technologies (ITB4201)
B+ Tree
• A B-tree that follows certain rules:
– If the root is not a leaf node, it must have at least two children.
– For a tree of order n, each node (other than root/leaf nodes) must
have between n/2 and n pointers and children.
– … (other rules)
– Tree must always be balanced
• Every path from the root to a leaf must have same length.  This is what
matters.
• This means that it always takes about the same time to access any record.
Department of Information Technology 14Data base Technologies (ITB4201)
B+ Tree Indexes
 Leaf pages contain data entries, and are chained (prev & next)
 Non-leaf pages contain index entries and direct searches:
P0 K 1 P 1 K 2 P 2 K m P m
index entry
Non-leaf
Pages
Pages
Leaf
Department of Information Technology 15Data base Technologies (ITB4201)
Example B+ Tree
• Insert/delete: Find data entry in leaf, then change it. Need to
adjust parent sometimes.
– And change sometimes bubbles up the tree
2* 3*
Root
17
30
14* 16* 33* 34* 38* 39*
135
7*5* 8* 22* 24*
27
27* 29*
Entries <= 17 Entries > 17
Department of Information Technology 16Data base Technologies (ITB4201)
Ordered Indexes
• An ordered file that stores the index field and a pointer.
• Requires a binary search on the index file.
• Types of Ordered Indexes
– Primary
– Clustering
– Secondary
Department of Information Technology 17Data base Technologies (ITB4201)
Primary Indexes
• An ordered, fixed-length file that stores the primary key and a
pointer.
• Each record in the index file corresponds to each block in the
ordered data file.
• Sparse / nondense index
Department of Information Technology 18Data base Technologies (ITB4201)
A Primary Index
Department of Information Technology 19Data base Technologies (ITB4201)
Clustering Indexes
• When the records of a data file are ordered on a non-key field,
this field is called a clustering field.
• The clustering field does not have a distinct value for each
record in the data file.
• A clustering index is an ordered file that stores the clustering
field and a pointer.
Department of Information Technology 20Data base Technologies (ITB4201)
Clustering Indexes
• There is one record in the index file for each distinct value of
the clustering field.
• Sparse / nondense index
Department of Information Technology 21Data base Technologies (ITB4201)
Department of Information Technology 22Data base Technologies (ITB4201)
Secondary Indexes
• A secondary index is an ordered file that stores the
nonordering field of a data file and a pointer.
• If the indexing field is a secondary or candidate key, then the
index is dense.
• If the indexing field is a nonkey field, then the index is sparse.
• Introduces the idea of adding a level of indirection
Department of Information Technology 23Data base Technologies (ITB4201)
Secondary Index on a
Secondary Key
Department of Information Technology 24Data base Technologies (ITB4201)
Secondary Index on
a Nonkey field
Department of Information Technology 25Data base Technologies (ITB4201)
Indexed Sequential
Data File Organization
• Ordered data file with a multilevel primary index file based on
the ordering key field.
• IBM’s ISAM is an indexed sequential file organization with a
two-level index.
Department of Information Technology 26Data base Technologies (ITB4201)
Understanding the Workload
• For each query in the workload:
– Which relations does it access?
– Which attributes are retrieved?
– Which attributes are involved in selection/join conditions? How selective
are these conditions likely to be?
• For each update in the workload:
– Which attributes are involved in selection/join conditions? How selective
are these conditions likely to be?
– The type of update (INSERT/DELETE/UPDATE), and the attributes that are
affected.
Department of Information Technology 27Data base Technologies (ITB4201)
Choice of Indexes
• What indexes should we create?
– Which relations should have indexes? What field(s) should be the
search key? Should we build several indexes?
• For each index, what kind of an index should it be?
– Clustered? Hash/tree?
Department of Information Technology 28Data base Technologies (ITB4201)
Choice of Indexes (Contd.)
• One approach: Consider the most important queries in turn. Consider the best plan
using the current indexes, and see if a better plan is possible with an additional
index. If so, create it.
– Obviously, this implies that we must understand how a DBMS evaluates queries and creates
query evaluation plans!
– For now, we discuss simple 1-table queries.
• Before creating an index, must also consider the impact on updates in the workload!
– Trade-off: Indexes can make queries go faster, updates slower. Require disk space, too.
Department of Information Technology 29Data base Technologies (ITB4201)
Index Selection Guidelines
• Attributes in WHERE clause are candidates for index keys.
– Exact match condition suggests hash index.
– Range query suggests tree index.
• Clustering is especially useful for range queries; can also help on equality
queries if there are many duplicates.
Department of Information Technology 30Data base Technologies (ITB4201)
Index Selection Guidelines (2)
• Multi-attribute search keys should be considered when a WHERE clause contains
several conditions.
– Order of attributes is important for range queries.
– Such indexes can sometimes enable index-only strategies for important queries.
• For index-only strategies, clustering is not important!
• Try to choose indexes that benefit as many queries as possible. Since only one index
can be clustered per relation, choose it based on important queries that would
benefit the most from clustering.
Department of Information Technology 31Data base Technologies (ITB4201)
Rules-of-Thumb for Indexes
• Index primary key fields
– DBMS may do this automatically
– Composite keys require composite indexes
• Index foreign key fields
• Index other fields frequently used in:
– WHERE clauses
– GROUP BY clauses
– ORDER BY clauses
• Consider composite indexes when fields are used together in
conditions
Department of Information Technology 32Data base Technologies (ITB4201)
Index Method Benefits/Costs
• Heap file:
– Efficient storage; fast scanning; fast inserts
– Slow searches; slow deletes
• Sorted file:
– Efficient storage; faster searches than heap
– Slow inserts and deletes
Department of Information Technology 33Data base Technologies (ITB4201)
Index Method Benefits/Costs (2)
• Clustered file:
– Efficient storage; faster searches than heap
– Faster inserts/deletes than sorted file
– Only one clustering factor per file
• Unclustered tree:
– Fast searches, fast inserts/deletes
– Slow scans and range searches with many matches
• Hash:
– Same as unclustered, but faster on equality searches
– Does not support range searches
Department of Information Technology 34Data base Technologies (ITB4201)
Summary
• Many alternative file organizations exist, each appropriate in some situation.
• If selection queries are frequent, sorting the file or building an index is important.
– Hash-based indexes only good for equality search.
– Sorted files and tree-based indexes best for range search; also good for equality search. (Files
rarely kept sorted in practice; B+ tree index is better.)
• Index is a collection of data entries plus a way to quickly find entries with given key
values.
Department of Information Technology 35Data base Technologies (ITB4201)
Summary (Contd.)
• Data entries can be actual data records, <key, rid> pairs, or
<key, rid-list> pairs.
• Can have several indexes on a given file of data records, each
with a different search key.
• Indexes can be classified as clustered vs. unclustered, primary
vs. secondary, and dense vs. sparse. Differences have
important consequences for utility/performance.
Department of Information Technology 36Data base Technologies (ITB4201)
Summary (Contd.)
• Understanding the nature of the workload for the application, and the performance
goals, is essential to developing a good design.
– What are the important queries and updates? What attributes/relations are involved?
• Indexes must be chosen to speed up important queries (and perhaps some
updates!).
– Index maintenance overhead on updates to key fields.
– Choose indexes that can help many queries, if possible.
– Build indexes to support index-only strategies.
– Clustering is an important decision; only one index on a given relation can be clustered!
– Order of fields in composite index key can be important.
Department of Information Technology 37Data base Technologies (ITB4201)
Test Yourself
1. What is true about indexes?
A. Indexes enhance the performance even if the table is updated frequently
B. It makes harder for sql server engines to work to work on index which have large keys
C. It doesn’t make harder for sql server engines to work to work on index which have large keys
D. None of the mentioned
2. Does index take space in the disk ?
A. It stores memory as and when required
B. Yes, Indexes are stored on disk
C. Indexes are never stored on disk
D. Indexes take no space
3. What are composite indexes ?
A. Are those which are composed by database for its internal use
B. A composite index is a combination of index on 2 or more columns
C. Composite index can never be created
D. None of the mentioned
4. An index is clustered, if
A. A it is on a set of fields that form a candidate key.
B. it is on a set of fields that include the primary key.
C. the data records of the file are organized in the same order as the data entries of the index.
D. the data records of the file are organized not in the same order as the data entries of the index.
5. A _________________ index is the one which satisfies all the columns requested in the query without performing further lookup into the clustered index.
A. Clustered
B. Non Clustered
C. Covering
D. B-Tree
Department of Information Technology 38Data base Technologies (ITB4201)
Answers
1. What is true about indexes?
A. Indexes enhance the performance even if the table is updated frequently
B. It makes harder for sql server engines to work to work on index which have large keys
C. It doesn’t make harder for sql server engines to work to work on index which have large keys
D. None of the mentioned
2. Does index take space in the disk ?
A. It stores memory as and when required
B. Yes, Indexes are stored on disk
C. Indexes are never stored on disk
D. Indexes take no space
3. What are composite indexes ?
A. Are those which are composed by database for its internal use
B. A composite index is a combination of index on 2 or more columns
C. Composite index can never be created
D. None of the mentioned
4. An index is clustered, if
A. A it is on a set of fields that form a candidate key.
B. it is on a set of fields that include the primary key.
C. the data records of the file are organized in the same order as the data entries of the index.
D. the data records of the file are organized not in the same order as the data entries of the index.
5. A _________________ index is the one which satisfies all the columns requested in the query without performing further lookup into the clustered index.
A. Clustered
B. Non Clustered
C. Covering
D. B-Tree

More Related Content

What's hot

Constraints In Sql
Constraints In SqlConstraints In Sql
Constraints In Sql
Anurag
 

What's hot (20)

Component and Deployment Diagram - Brief Overview
Component and Deployment Diagram - Brief OverviewComponent and Deployment Diagram - Brief Overview
Component and Deployment Diagram - Brief Overview
 
Slide 5 keys
Slide 5 keysSlide 5 keys
Slide 5 keys
 
introduction to visual basic PPT.pptx
introduction to visual basic PPT.pptxintroduction to visual basic PPT.pptx
introduction to visual basic PPT.pptx
 
Types Of Keys in DBMS
Types Of Keys in DBMSTypes Of Keys in DBMS
Types Of Keys in DBMS
 
File organization
File organizationFile organization
File organization
 
SQL - Data Definition Language
SQL - Data Definition Language SQL - Data Definition Language
SQL - Data Definition Language
 
File organization and indexing
File organization and indexingFile organization and indexing
File organization and indexing
 
Introduction to RDBMS
Introduction to RDBMSIntroduction to RDBMS
Introduction to RDBMS
 
Relational database- Fundamentals
Relational database- FundamentalsRelational database- Fundamentals
Relational database- Fundamentals
 
Sql server basics
Sql server basicsSql server basics
Sql server basics
 
File system
File systemFile system
File system
 
The relational database model
The relational database modelThe relational database model
The relational database model
 
Input output in linux
Input output in linuxInput output in linux
Input output in linux
 
Unit 2 oracle9i
Unit 2  oracle9i Unit 2  oracle9i
Unit 2 oracle9i
 
File organization 1
File organization 1File organization 1
File organization 1
 
Constraints In Sql
Constraints In SqlConstraints In Sql
Constraints In Sql
 
Visual programming lecture
Visual programming lecture Visual programming lecture
Visual programming lecture
 
OPERATING SYSTEMS DESIGN AND IMPLEMENTATION
OPERATING SYSTEMSDESIGN AND IMPLEMENTATIONOPERATING SYSTEMSDESIGN AND IMPLEMENTATION
OPERATING SYSTEMS DESIGN AND IMPLEMENTATION
 
What is a DATA DICTIONARY?
What is a DATA DICTIONARY?What is a DATA DICTIONARY?
What is a DATA DICTIONARY?
 
Normalization in DBMS
Normalization in DBMSNormalization in DBMS
Normalization in DBMS
 

Similar to Indexing

Design logic&sistem pengkodean
Design logic&sistem pengkodeanDesign logic&sistem pengkodean
Design logic&sistem pengkodean
khaerul azmi
 
chapter09-120827115409-phpapp01.pdf
chapter09-120827115409-phpapp01.pdfchapter09-120827115409-phpapp01.pdf
chapter09-120827115409-phpapp01.pdf
AxmedMaxamuud6
 
Page 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docxPage 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docx
smile790243
 

Similar to Indexing (20)

Data models
Data modelsData models
Data models
 
Database File operation
Database File operationDatabase File operation
Database File operation
 
Database system structure
Database system structureDatabase system structure
Database system structure
 
Relational database design
Relational database designRelational database design
Relational database design
 
Indexing and hashing
Indexing and hashingIndexing and hashing
Indexing and hashing
 
Distributed databases
Distributed  databasesDistributed  databases
Distributed databases
 
Introduction to Database Management Systems
Introduction to Database Management SystemsIntroduction to Database Management Systems
Introduction to Database Management Systems
 
Unit 08 dbms
Unit 08 dbmsUnit 08 dbms
Unit 08 dbms
 
Computing 7
Computing 7Computing 7
Computing 7
 
Design logic&sistem pengkodean
Design logic&sistem pengkodeanDesign logic&sistem pengkodean
Design logic&sistem pengkodean
 
chapter09-120827115409-phpapp01.pdf
chapter09-120827115409-phpapp01.pdfchapter09-120827115409-phpapp01.pdf
chapter09-120827115409-phpapp01.pdf
 
Normalization
NormalizationNormalization
Normalization
 
Chapter 9 Data Design .pptxInformation Technology Project Management
Chapter 9 Data Design .pptxInformation Technology Project ManagementChapter 9 Data Design .pptxInformation Technology Project Management
Chapter 9 Data Design .pptxInformation Technology Project Management
 
Data structure unitfirst part1
Data structure unitfirst part1Data structure unitfirst part1
Data structure unitfirst part1
 
Data structure unitfirst part1
Data structure unitfirst part1Data structure unitfirst part1
Data structure unitfirst part1
 
Page 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docxPage 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docx
 
PPT-UEU-Basis-Data-Pertemuan-1.pptx
PPT-UEU-Basis-Data-Pertemuan-1.pptxPPT-UEU-Basis-Data-Pertemuan-1.pptx
PPT-UEU-Basis-Data-Pertemuan-1.pptx
 
5. indexing
5. indexing5. indexing
5. indexing
 
Lecture 2 Data Structure Introduction
Lecture 2 Data Structure IntroductionLecture 2 Data Structure Introduction
Lecture 2 Data Structure Introduction
 
Data structures
Data structuresData structures
Data structures
 

More from Dr. C.V. Suresh Babu

More from Dr. C.V. Suresh Babu (20)

Data analytics with R
Data analytics with RData analytics with R
Data analytics with R
 
Association rules
Association rulesAssociation rules
Association rules
 
Clustering
ClusteringClustering
Clustering
 
Classification
ClassificationClassification
Classification
 
Blue property assumptions.
Blue property assumptions.Blue property assumptions.
Blue property assumptions.
 
Introduction to regression
Introduction to regressionIntroduction to regression
Introduction to regression
 
DART
DARTDART
DART
 
Mycin
MycinMycin
Mycin
 
Expert systems
Expert systemsExpert systems
Expert systems
 
Dempster shafer theory
Dempster shafer theoryDempster shafer theory
Dempster shafer theory
 
Bayes network
Bayes networkBayes network
Bayes network
 
Bayes' theorem
Bayes' theoremBayes' theorem
Bayes' theorem
 
Knowledge based agents
Knowledge based agentsKnowledge based agents
Knowledge based agents
 
Rule based system
Rule based systemRule based system
Rule based system
 
Formal Logic in AI
Formal Logic in AIFormal Logic in AI
Formal Logic in AI
 
Production based system
Production based systemProduction based system
Production based system
 
Game playing in AI
Game playing in AIGame playing in AI
Game playing in AI
 
Diagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AIDiagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AI
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
 

Recently uploaded

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Krashi Coaching
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
fonyou31
 

Recently uploaded (20)

Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 

Indexing

  • 1. Department of Information Technology 1Data base Technologies (ITB4201) Physical Database Design: Indexing Dr. C.V. Suresh Babu Professor Department of IT Hindustan Institute of Science & Technology
  • 2. Department of Information Technology 2Data base Technologies (ITB4201) Action Plan • Physical Database Design Phase • Steps in Physical Database Design • Files and Records • Index Classification • Hash-Based Indexes • Tree Indexes • B+ Tree • Ordered Indexes • Primary Indexes • Clustering Indexes • Secondary Indexes • Indexed Sequential • Quiz
  • 3. Department of Information Technology 3Data base Technologies (ITB4201) Physical Database Design Phase • Inputs into the Physical Design Phase – Logical (implementation) model – Documentation/Definitions – DBMS Characteristics – Response time requirements – Security, Backup, Recovery, Retention, Integrity requirements • Logical Design Phase focuses on What; Physical Design Phases focuses on How.
  • 4. Department of Information Technology 4Data base Technologies (ITB4201) Physical Database Design Phase • Outputs: – Produces a Description of the Implementation of the Database on Secondary Storage. – Describes the storage structures and access methods used to achieve efficient access to the data. • Focus on Data Processing Efficiency
  • 5. Department of Information Technology 5Data base Technologies (ITB4201) Steps in Physical Database Design • Translate Logical (Implementation) Data Model for Target DBMS – Design base relations and constraints
  • 6. Department of Information Technology 6Data base Technologies (ITB4201) Steps in Physical Database Design • Design Physical Representation – Analyze transactions – Choose file organizations – Choose secondary indexes – Introduction of controlled redundancy (Denormalization) – Estimate disk space requirements
  • 7. Department of Information Technology 7Data base Technologies (ITB4201) Files and Records • Records – Contain fields (fixed and variable length fields) – Fixed-length – Variable-length – Spanned – Unspanned
  • 8. Department of Information Technology 8Data base Technologies (ITB4201) Files and Records • Block (page): The amount of data read or written in one I/O operation. • Blocking Factor: The number of physical records per block. • File Organization: How the physical records in a file are arranged on the disk. • Access Method: How the data can be retrieved based on the file organization.
  • 9. Department of Information Technology 9Data base Technologies (ITB4201) Index Classification • Primary vs. secondary: If search key contains primary key, then called primary index. – Unique index: Search key contains a candidate key. • Clustered vs. unclustered: If order of data records is the same as, or `close to’, order of data entries, then called clustered index. – A file can be clustered on at most one search key. – Cost of retrieving data records through index varies greatly based on whether index is clustered or not!
  • 10. Department of Information Technology 10Data base Technologies (ITB4201) Hash-Based Indexes • Hash function calculates the address of the page on which the record is stored. • The field that’s used in the hash function is called the hash field. – Called a hash key if the hash field is also a key field. • Good for equality searches • Not good for range searches
  • 11. Department of Information Technology 11Data base Technologies (ITB4201) Hash File
  • 12. Department of Information Technology 12Data base Technologies (ITB4201) Tree Indexes • Many DBMS use tree data structures to hold data/indexes. • Tree contains a hierarchy of nodes • Root node has no parent • Other nodes have one parent • Nodes can (but don’t have to) have child nodes. • A node with no children is called a leaf node. • Balanced tree: depth (number of levels) between root and each leaf node is the same.
  • 13. Department of Information Technology 13Data base Technologies (ITB4201) B+ Tree • A B-tree that follows certain rules: – If the root is not a leaf node, it must have at least two children. – For a tree of order n, each node (other than root/leaf nodes) must have between n/2 and n pointers and children. – … (other rules) – Tree must always be balanced • Every path from the root to a leaf must have same length.  This is what matters. • This means that it always takes about the same time to access any record.
  • 14. Department of Information Technology 14Data base Technologies (ITB4201) B+ Tree Indexes  Leaf pages contain data entries, and are chained (prev & next)  Non-leaf pages contain index entries and direct searches: P0 K 1 P 1 K 2 P 2 K m P m index entry Non-leaf Pages Pages Leaf
  • 15. Department of Information Technology 15Data base Technologies (ITB4201) Example B+ Tree • Insert/delete: Find data entry in leaf, then change it. Need to adjust parent sometimes. – And change sometimes bubbles up the tree 2* 3* Root 17 30 14* 16* 33* 34* 38* 39* 135 7*5* 8* 22* 24* 27 27* 29* Entries <= 17 Entries > 17
  • 16. Department of Information Technology 16Data base Technologies (ITB4201) Ordered Indexes • An ordered file that stores the index field and a pointer. • Requires a binary search on the index file. • Types of Ordered Indexes – Primary – Clustering – Secondary
  • 17. Department of Information Technology 17Data base Technologies (ITB4201) Primary Indexes • An ordered, fixed-length file that stores the primary key and a pointer. • Each record in the index file corresponds to each block in the ordered data file. • Sparse / nondense index
  • 18. Department of Information Technology 18Data base Technologies (ITB4201) A Primary Index
  • 19. Department of Information Technology 19Data base Technologies (ITB4201) Clustering Indexes • When the records of a data file are ordered on a non-key field, this field is called a clustering field. • The clustering field does not have a distinct value for each record in the data file. • A clustering index is an ordered file that stores the clustering field and a pointer.
  • 20. Department of Information Technology 20Data base Technologies (ITB4201) Clustering Indexes • There is one record in the index file for each distinct value of the clustering field. • Sparse / nondense index
  • 21. Department of Information Technology 21Data base Technologies (ITB4201)
  • 22. Department of Information Technology 22Data base Technologies (ITB4201) Secondary Indexes • A secondary index is an ordered file that stores the nonordering field of a data file and a pointer. • If the indexing field is a secondary or candidate key, then the index is dense. • If the indexing field is a nonkey field, then the index is sparse. • Introduces the idea of adding a level of indirection
  • 23. Department of Information Technology 23Data base Technologies (ITB4201) Secondary Index on a Secondary Key
  • 24. Department of Information Technology 24Data base Technologies (ITB4201) Secondary Index on a Nonkey field
  • 25. Department of Information Technology 25Data base Technologies (ITB4201) Indexed Sequential Data File Organization • Ordered data file with a multilevel primary index file based on the ordering key field. • IBM’s ISAM is an indexed sequential file organization with a two-level index.
  • 26. Department of Information Technology 26Data base Technologies (ITB4201) Understanding the Workload • For each query in the workload: – Which relations does it access? – Which attributes are retrieved? – Which attributes are involved in selection/join conditions? How selective are these conditions likely to be? • For each update in the workload: – Which attributes are involved in selection/join conditions? How selective are these conditions likely to be? – The type of update (INSERT/DELETE/UPDATE), and the attributes that are affected.
  • 27. Department of Information Technology 27Data base Technologies (ITB4201) Choice of Indexes • What indexes should we create? – Which relations should have indexes? What field(s) should be the search key? Should we build several indexes? • For each index, what kind of an index should it be? – Clustered? Hash/tree?
  • 28. Department of Information Technology 28Data base Technologies (ITB4201) Choice of Indexes (Contd.) • One approach: Consider the most important queries in turn. Consider the best plan using the current indexes, and see if a better plan is possible with an additional index. If so, create it. – Obviously, this implies that we must understand how a DBMS evaluates queries and creates query evaluation plans! – For now, we discuss simple 1-table queries. • Before creating an index, must also consider the impact on updates in the workload! – Trade-off: Indexes can make queries go faster, updates slower. Require disk space, too.
  • 29. Department of Information Technology 29Data base Technologies (ITB4201) Index Selection Guidelines • Attributes in WHERE clause are candidates for index keys. – Exact match condition suggests hash index. – Range query suggests tree index. • Clustering is especially useful for range queries; can also help on equality queries if there are many duplicates.
  • 30. Department of Information Technology 30Data base Technologies (ITB4201) Index Selection Guidelines (2) • Multi-attribute search keys should be considered when a WHERE clause contains several conditions. – Order of attributes is important for range queries. – Such indexes can sometimes enable index-only strategies for important queries. • For index-only strategies, clustering is not important! • Try to choose indexes that benefit as many queries as possible. Since only one index can be clustered per relation, choose it based on important queries that would benefit the most from clustering.
  • 31. Department of Information Technology 31Data base Technologies (ITB4201) Rules-of-Thumb for Indexes • Index primary key fields – DBMS may do this automatically – Composite keys require composite indexes • Index foreign key fields • Index other fields frequently used in: – WHERE clauses – GROUP BY clauses – ORDER BY clauses • Consider composite indexes when fields are used together in conditions
  • 32. Department of Information Technology 32Data base Technologies (ITB4201) Index Method Benefits/Costs • Heap file: – Efficient storage; fast scanning; fast inserts – Slow searches; slow deletes • Sorted file: – Efficient storage; faster searches than heap – Slow inserts and deletes
  • 33. Department of Information Technology 33Data base Technologies (ITB4201) Index Method Benefits/Costs (2) • Clustered file: – Efficient storage; faster searches than heap – Faster inserts/deletes than sorted file – Only one clustering factor per file • Unclustered tree: – Fast searches, fast inserts/deletes – Slow scans and range searches with many matches • Hash: – Same as unclustered, but faster on equality searches – Does not support range searches
  • 34. Department of Information Technology 34Data base Technologies (ITB4201) Summary • Many alternative file organizations exist, each appropriate in some situation. • If selection queries are frequent, sorting the file or building an index is important. – Hash-based indexes only good for equality search. – Sorted files and tree-based indexes best for range search; also good for equality search. (Files rarely kept sorted in practice; B+ tree index is better.) • Index is a collection of data entries plus a way to quickly find entries with given key values.
  • 35. Department of Information Technology 35Data base Technologies (ITB4201) Summary (Contd.) • Data entries can be actual data records, <key, rid> pairs, or <key, rid-list> pairs. • Can have several indexes on a given file of data records, each with a different search key. • Indexes can be classified as clustered vs. unclustered, primary vs. secondary, and dense vs. sparse. Differences have important consequences for utility/performance.
  • 36. Department of Information Technology 36Data base Technologies (ITB4201) Summary (Contd.) • Understanding the nature of the workload for the application, and the performance goals, is essential to developing a good design. – What are the important queries and updates? What attributes/relations are involved? • Indexes must be chosen to speed up important queries (and perhaps some updates!). – Index maintenance overhead on updates to key fields. – Choose indexes that can help many queries, if possible. – Build indexes to support index-only strategies. – Clustering is an important decision; only one index on a given relation can be clustered! – Order of fields in composite index key can be important.
  • 37. Department of Information Technology 37Data base Technologies (ITB4201) Test Yourself 1. What is true about indexes? A. Indexes enhance the performance even if the table is updated frequently B. It makes harder for sql server engines to work to work on index which have large keys C. It doesn’t make harder for sql server engines to work to work on index which have large keys D. None of the mentioned 2. Does index take space in the disk ? A. It stores memory as and when required B. Yes, Indexes are stored on disk C. Indexes are never stored on disk D. Indexes take no space 3. What are composite indexes ? A. Are those which are composed by database for its internal use B. A composite index is a combination of index on 2 or more columns C. Composite index can never be created D. None of the mentioned 4. An index is clustered, if A. A it is on a set of fields that form a candidate key. B. it is on a set of fields that include the primary key. C. the data records of the file are organized in the same order as the data entries of the index. D. the data records of the file are organized not in the same order as the data entries of the index. 5. A _________________ index is the one which satisfies all the columns requested in the query without performing further lookup into the clustered index. A. Clustered B. Non Clustered C. Covering D. B-Tree
  • 38. Department of Information Technology 38Data base Technologies (ITB4201) Answers 1. What is true about indexes? A. Indexes enhance the performance even if the table is updated frequently B. It makes harder for sql server engines to work to work on index which have large keys C. It doesn’t make harder for sql server engines to work to work on index which have large keys D. None of the mentioned 2. Does index take space in the disk ? A. It stores memory as and when required B. Yes, Indexes are stored on disk C. Indexes are never stored on disk D. Indexes take no space 3. What are composite indexes ? A. Are those which are composed by database for its internal use B. A composite index is a combination of index on 2 or more columns C. Composite index can never be created D. None of the mentioned 4. An index is clustered, if A. A it is on a set of fields that form a candidate key. B. it is on a set of fields that include the primary key. C. the data records of the file are organized in the same order as the data entries of the index. D. the data records of the file are organized not in the same order as the data entries of the index. 5. A _________________ index is the one which satisfies all the columns requested in the query without performing further lookup into the clustered index. A. Clustered B. Non Clustered C. Covering D. B-Tree