Modified version of Chapter 18 of the book Fundamentals_of_Database_Systems,_6th_Edition with review questions
as part of database management system course
Modified version of Chapter 18 of the book Fundamentals_of_Database_Systems,_6th_Edition with review questions
as part of database management system course
MLDM provides an original scientific position in Europe on problems related to pattern recognition, machine learning, classification, modelling, knowledge extraction and data mining. These issues have a strong employability potential for students trained in the field of modelling, prediction or decision support, as well as in the area of the Web, image and video processing, health informatics, etc.
For graphs of mathematical functions, see Graph of a function. For other uses, see Graph (disambiguation). A drawing of a graph. In mathematics graph theory is the study of graphs, which are mathematical structures used.In mathematics, and more specifically in graph theory, a tree is an undirected graph in which any two vertices are connected by exactly one path. In other words, any acyclic connected graph is a tree. A forest is a disjoint union of trees.
Synchronous Optical Networking (SONET) and Synchronous Digital Hierarchy (SDH) are standardized multiplexing protocols that transfer multiple digital bit streams over optical fiber using lasers or light-emitting diodes (LEDs). Lower data rates can also be transferred via an electrical interface.
Discrete Mathematics - Sets. ... He had defined a set as a collection of definite and distinguishable objects selected by the means of certain rules or description. Set theory forms the basis of several other fields of study like counting theory, relations, graph theory and finite state machines.
Discrete Mathematics - Sets. ... He had defined a set as a collection of definite and distinguishable objects selected by the means of certain rules or description. Set theory forms the basis of several other fields of study like counting theory, relations, graph theory and finite state machines.
The Parallel RLC Circuit is the exact opposite to the series circuit we looked at in the previous tutorial although some of the previous concepts and equations still apply.
The Parallel RLC Circuit is the exact opposite to the series circuit we looked at in the previous tutorial although some of the previous concepts and equations still apply.
Discrete Mathematics - Relations. ... Relations may exist between objects of the same set or between objects of two or more sets. Definition and Properties. A binary relation R from set x to y (written as x R y o r R ( x , y ) ) is a subset of the Cartesian product x × y .
Propositional calculus (also called propositional logic, sentential calculus, sentential logic, or sometimes zeroth-order logic) is the branch of logic concerned with the study of propositions (whether they are true or false) that are formed by other propositions with the use of logical connectives, and how their value depends on the truth value of their components. Logical connectives are found in natural languages.
Propositional calculus (also called propositional logic, sentential calculus, sentential logic, or sometimes zeroth-order logic) is the branch of logic concerned with the study of propositions (whether they are true or false) that are formed by other propositions with the use of logical connectives, and how their value depends on the truth value of their components. Logical connectives are found in natural languages.
In computer science, Prim's algorithm is a greedy algorithm that finds a minimum spanning tree for a weighted undirected graph. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized.
Discrete Mathematics is a branch of mathematics involving discrete elements that uses algebra and arithmetic. It is increasingly being applied in the practical fields of mathematics and computer science. It is a very good tool for improving reasoning and problem-solving capabilities.
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
The Indian economy is classified into different sectors to simplify the analysis and understanding of economic activities. For Class 10, it's essential to grasp the sectors of the Indian economy, understand their characteristics, and recognize their importance. This guide will provide detailed notes on the Sectors of the Indian Economy Class 10, using specific long-tail keywords to enhance comprehension.
For more information, visit-www.vavaclasses.com
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
2. Indexing: Basic Concepts
Evaluation Factors
Ordered Indices: Primary and Secondary
Dense and Sparse indices
Multilevel Indexing
B+ Tree Index Files
B-Tree Index Files
Hashing
Hash File Organization
Handling of Bucket Overflows
Open and Closed hashing
Hash Indices
7/24/2017 2Md. Golam Moazzam, Dept. of CSE, JU
OUTLINE
3. Indexing and Hashing
Database Index
A data structure that improves the speed of data retrieval operations on a
database table at the cost of slower writes and the use of more storage
space.
Basic Concept
An index for a file in a database system works in much the same way as the
index in this textbook. If we want to learn about a particular topic, we can
search for the topic in the index at the back of the book, find the pages
where it occurs, and then read the pages to find the information we are
looking for. The words in the index are in sorted order, making it easy to
find the word we are looking for. Moreover, the index is much smaller than
the book, further reducing the effort needed to find the words we are
looking for.
7/24/2017 3Md. Golam Moazzam, Dept. of CSE, JU
4. Indexing and Hashing
Types of Indices
There are two basic types of indices:
– Ordered Indices
– Hash Indices
Ordered Indices: Based on a sorted ordering of the values.
Hash Indices. Based on a uniform distribution of values across a range of buckets.
The bucket to which a value is assigned is determined by a function, called a hash
function.
Evaluation Factors
There are several techniques for both ordered indexing and hashing. No one
technique is the best. Rather, each technique is best suited to particular database
applications. Each technique must be evaluated on the basis of the following
factors:
7/24/2017 4Md. Golam Moazzam, Dept. of CSE, JU
5. Indexing and Hashing
Evaluation Factors
Access Types: Access types can include finding records with a
specified attribute value and finding records whose attribute values fall
in a specified range.
Access Time: The time it takes to find a particular data item, or set of
items, using the technique in question.
Insertion Time: The time it takes to insert a new data item. This value
includes the time it takes to find the correct place to insert the new data
item, as well as the time it takes to update the index structure.
7/24/2017 5Md. Golam Moazzam, Dept. of CSE, JU
6. Indexing and Hashing
Evaluation Factors
Deletion time: The time it takes to delete a data item. This value
includes the time it takes to find the item to be deleted, as well as the
time it takes to update the index structure.
Space overhead: The additional space occupied by an index structure.
Provided that the amount of additional space is moderate, it is usually
worthwhile to sacrifice the space to achieve improved performance.
7/24/2017 6Md. Golam Moazzam, Dept. of CSE, JU
7. Indexing and Hashing
Search Key
An attribute or set of attributes used to look up records in a file is called a
search key.
Ordered Indices
To gain fast random access to records in a file, we can use an index
structure.
Each index structure is associated with a particular search key.
An ordered index stores the values of the search keys in sorted order,
and associates with each search key the records that contain it.
A file may have several indices, on different search keys.
7/24/2017 7Md. Golam Moazzam, Dept. of CSE, JU
8. Indexing and Hashing
Ordered Indices
Primary Index
Secondary Index
Primary Index: If the file containing the records is sequentially
ordered, a primary index is an index whose search key also defines the
sequential order of the file.
Primary indices are also called clustering indices.
Types: Dense and Sparse
7/24/2017 8Md. Golam Moazzam, Dept. of CSE, JU
9. Indexing and Hashing
Dense Index
A dense index in databases is a file with pairs of keys and pointers for
every record in the data file. Every key in this file is associated with a
particular pointer to a record in the sorted data file.
An index record appears for every search-key value in the file.
In a dense primary index, the index record contains the search-key
value and a pointer to the first data record with that search-key value.
The rest of the records with the same search key-value would be stored
sequentially after the first record, because the index is a primary one,
records are sorted on the same search key.
7/24/2017 9Md. Golam Moazzam, Dept. of CSE, JU
11. Indexing and Hashing
Sparse Index
An index record appears for only some of the search-key values.
Each index record contains a search-key value and a pointer to the first
data record with that search-key value.
To locate a record, we find the index entry with the largest search-key
value that is less than or equal to the search-key value for which we are
looking.
We start at the record pointed to by that index entry, and follow the
pointers in the file until we find the desired record.
7/24/2017 11Md. Golam Moazzam, Dept. of CSE, JU
13. Indexing and Hashing
Dense VS Sparse Indices
It is generally faster to locate a record if we have a dense index rather
than a sparse index.
However, sparse indices have advantages over dense indices in that
they require less space and they impose less maintenance overhead for
insertions and deletions.
There is a trade-off that the system designer must make between access
time and space overhead.
7/24/2017 13Md. Golam Moazzam, Dept. of CSE, JU
14. Indexing and Hashing
Multi-Level Indices
If primary index does not fit in memory, access becomes expensive.
Solution: treat primary index kept on disk as a sequential file and
construct a sparse index on it.
- Outer index – a sparse index of primary index
- Inner index – the primary index file
If even outer index is too large to fit in main memory, yet another level
of index can be created, and so on.
Indices at all levels must be updated on insertion or deletion from the
file.
7/24/2017 14Md. Golam Moazzam, Dept. of CSE, JU
15. Indexing and Hashing
Multi-Level Indices: An Example
Consider 100,000 records, 10 per block, at one index record per block,
that's 10,000 index records. Even if we can fit 100 index records per
block, this is 100 blocks. If index is too large to be kept in main
memory, a search results in several disk reads.
For very large files, additional levels of indexing may be required.
Indices must be updated at all levels when insertions or deletions
require it.
Frequently, each level of index corresponds to a unit of physical
storage.
7/24/2017 15Md. Golam Moazzam, Dept. of CSE, JU
16. Indexing and Hashing
Multi-Level Indices: An Example
7/24/2017 16Md. Golam Moazzam, Dept. of CSE, JU
17. Indexing and Hashing
Secondary Index
– Indices whose search key specifies an order different from the
sequential order of the file are called secondary indices, or non-
clustering indices.
– Secondary indices must be dense with an index entry for every search-
key value, and a pointer to every record in the file.
7/24/2017 17Md. Golam Moazzam, Dept. of CSE, JU
19. Indexing and Hashing
Primary VS Secondary Indices
A sequential scan in primary index order is efficient because records in
the file are stored physically in the same order as the index order.
Secondary indices improve the performance of queries that use keys
other than the search key of the primary index. However, they impose a
significant overhead on modification of the database. The designer of a
database decides which secondary indices are desirable on the basis of
an estimate of the relative frequency of queries and modifications.
The primary index is on the field which specifies the sequential order
of the data file.
There can be only one primary index while there can be many
secondary indices.
7/24/2017 19Md. Golam Moazzam, Dept. of CSE, JU
20. Indexing and Hashing
B+ Tree Index Files
The main disadvantage of the index-sequential file organization is that
performance degrades as the file grows, both for index lookups and for
sequential scans through the data. To over come this deficiency, we use
a B+ tree index.
The B+ tree index structure is the most widely used of several index
structures that maintain their efficiency despite insertion and deletion of
data.
This is a balanced tree in which every path from the root of the tree to
a leaf of the tree is of the same length.
A B+ tree index is a multilevel index. A typical node of a B+tree is
shown below.
7/24/2017 20Md. Golam Moazzam, Dept. of CSE, JU
21. Indexing and Hashing
B+ Tree Index Files
A B+ tree index is a multilevel index. A typical node of a B+-tree is
shown below.
Each node that is not a root or a leaf has between n/2 and n children.
A leaf node has between (n–1)/2 and n–1 values
Special cases:
- If the root is not a leaf, it has at least 2 children.
- If the root is a leaf (that is, there are no other nodes in the tree),
it can have between 0 and (n–1) values.
7/24/2017 21Md. Golam Moazzam, Dept. of CSE, JU
22. Indexing and Hashing
B+ Tree Index Files
It contains up to n − 1 search-key values K1, K2, . . .,Kn−1, and n
pointers P1, P2, . . . ,Pn.
The search-keys in a node are ordered: K1 < K2 < K3 < . . . < Kn–1
For leaf nodes, for i = 1, 2, . . . , n − 1, pointer Pi points to either a file
record with search-key value Ki or to a bucket of pointers, each of
which points to a file record with search-key value Ki.
7/24/2017 22Md. Golam Moazzam, Dept. of CSE, JU
23. Indexing and Hashing
B+ Tree Index Files
A non-leaf node may hold up to n pointers, and must hold at least n/2
pointers.
The number of pointers in a node is called the fanout of the node.
The root node can hold fewer than n/2 pointers. However, it must
hold at least two pointers.
7/24/2017 23Md. Golam Moazzam, Dept. of CSE, JU
24. Indexing and Hashing
Construct a B+ tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6.
Solution: Construction of B+ tree for order n=4.
Search key values =3, Pointers= 4.
Insert key value 2:
Insert key value 3:
7/24/2017 24Md. Golam Moazzam, Dept. of CSE, JU
2
2 3
25. Indexing and Hashing
Construct a B+ tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6.
Insert key value 5:
Insert key value 7: Split the node.
7/24/2017 25Md. Golam Moazzam, Dept. of CSE, JU
2 3 5
2 3 5 7
5
26. Indexing and Hashing
Construct a B+ tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6.
Insert key value 11:
Insert key value 17: Split the node.
7/24/2017 26Md. Golam Moazzam, Dept. of CSE, JU
2 3 5 7
5 11
2 3 5 7 11
5
11 17
27. Indexing and Hashing
Construct a B+ tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6.
Insert key value 19:
Insert key value 23: Split the node.
7/24/2017 27Md. Golam Moazzam, Dept. of CSE, JU
2 3 5 7
5 11 19
11 17
2 3 5 7
5 11
11 17 19
19 23
28. Indexing and Hashing
Construct a B+ tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6.
Insert key value 29:
7/24/2017 28Md. Golam Moazzam, Dept. of CSE, JU
2 3 5 7
5 11 19
11 17 19 23 29
29. Indexing and Hashing
Construct a B+ tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6.
Insert key value 31:
7/24/2017 29Md. Golam Moazzam, Dept. of CSE, JU
19
2 3 5 7 11 17 19 23 29 31
5 11 29
30. Indexing and Hashing
Construct a B+-tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6.
For n=6:
7/24/2017 30Md. Golam Moazzam, Dept. of CSE, JU
7 19
2 3 5 7 11 17 19 23 3129
31. Indexing and Hashing
B-Tree Index Files
– B-tree indices are similar to B+ tree indices. The primary distinction
between the two approaches is that a B-tree eliminates the redundant
storage of search-key values.
– A B-tree allows search-key values to appear only once. Thus, it is
necessary to include an additional pointer field for each search key in a
nonleaf node. These additional pointers point to either file records or
buckets for the associated search key
7/24/2017 31Md. Golam Moazzam, Dept. of CSE, JU
32. Indexing and Hashing
B-Tree Index Files
– A generalized B-tree leaf node and a non-leaf node appear in Fig. (a)
and Fig. (b) respectively.
7/24/2017 32Md. Golam Moazzam, Dept. of CSE, JU
33. Indexing and Hashing
B-Tree Index Files
Leaf nodes are the same as in B+ trees. In nonleaf nodes, the pointers Pi
are the tree pointers that we used also for B+ trees, while the pointers
Bi are bucket or file-record pointers. In the generalized B-tree in the
figure, there are n – 1 keys in the leaf node, but there are m − 1 keys in
the nonleaf node. This discrepancy occurs because nonleaf nodes must
include pointers Bi, thus reducing the number of search keys that can be
held in these nodes.
Advantages of B-Tree indices
May use less tree nodes than a corresponding B+ Tree.
Sometimes possible to find search-key value before reaching leaf node.
7/24/2017 33Md. Golam Moazzam, Dept. of CSE, JU
34. Indexing and Hashing
Disadvantages of B-Tree indices
Only small fraction of all search-key values are found early.
Non-leaf nodes are larger, so fan-out is reduced. Thus, B-Trees
typically have greater depth than corresponding B+ Tree
Insertion and deletion more complicated than in B+ Trees.
Implementation is harder than B+ Trees.
7/24/2017 34Md. Golam Moazzam, Dept. of CSE, JU
35. Indexing and Hashing
7/24/2017 35Md. Golam Moazzam, Dept. of CSE, JU
Construct a B- tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6.
Solution: Construction of B- tree for order n=4.
Search key values =3, Pointers= 4.
Insert key value 2:
Insert key value 3:
2
2 3
36. Indexing and Hashing
7/24/2017 36Md. Golam Moazzam, Dept. of CSE, JU
Construct a B- tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6.
Solution:
Insert key value 5:
Insert key value 7:
2 3 5
2 3 7
5
37. Indexing and Hashing
7/24/2017 37Md. Golam Moazzam, Dept. of CSE, JU
Construct a B- tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6.
Solution:
Insert key value 11:
Insert key value 17:
2 3 7 11 17
5
2 3 7 11
5
38. Indexing and Hashing
7/24/2017 38Md. Golam Moazzam, Dept. of CSE, JU
Construct a B- tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6.
Solution:
Insert key value 19:
Insert key value 23:
2 3 7 11
5 17
19
2 3 7 11
5 17
19 23
39. Indexing and Hashing
7/24/2017 39Md. Golam Moazzam, Dept. of CSE, JU
Construct a B- tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6.
Solution:
Insert key value 29:
2 3 7 11
5 17
19 23 29
40. Indexing and Hashing
Construct a B-tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6.
Solution:
Insert key value 31:
7/24/2017 40Md. Golam Moazzam, Dept. of CSE, JU
5 29
2 3 7 11
17
19 23 31
41. Indexing and Hashing
Hashing
One disadvantage of sequential file organization is that we must use an
index structure to locate data. File organizations based on the technique
of hashing allow us to avoid accessing an index structure. Hashing also
provides a way of constructing indices.
File organizations based on hashing allow us to find the address of a
data item directly by computing a function on the search-key value of
the desired record.
7/24/2017 41Md. Golam Moazzam, Dept. of CSE, JU
42. Indexing and Hashing
Hash File Organization
In a hash file organization, we obtain the address of the disk block, also
called the bucket containing a desired record directly by computing a
function on the search-key value of the record.
Let K denote the set of all search-key values, and let B denote the set of
all bucket addresses. A hash function h is a function from K to B. Let h
denote a hash function.
To insert a record with search key Ki, we compute h(Ki), which gives
the address of the bucket for that record. Assume for now that there is
space in the bucket to store the record. Then, the record is stored in that
bucket.
7/24/2017 42Md. Golam Moazzam, Dept. of CSE, JU
43. Indexing and Hashing
Hash File Organization
To perform a lookup on a search-key value Ki, we simply compute
h(Ki), then search the bucket with that address. Suppose that two search
keys, K5 and K7, have the same hash value; that is, h(K5) = h(K7). If we
perform a lookup on K5, the bucket h(K5) contains records with search-
key values K5 and records with search key values K7. Thus, we have to
check the search-key value of every record in the bucket to verify that
the record is one that we want.
7/24/2017 43Md. Golam Moazzam, Dept. of CSE, JU
44. Indexing and Hashing
Hash File Organization: An Example
– Let us choose a hash function for the account file using the search key
branch_name.
– Suppose we have 26 buckets and we define a hash function that maps
names beginning with the ith letter of the alphabet to the ith bucket.
– This hash function has the virtue of simplicity, but it fails to provide a
uniform distribution, since we expect more branch names to begin with
such letters as B and R than Q and X.
7/24/2017 44Md. Golam Moazzam, Dept. of CSE, JU
45. Indexing and Hashing
Hash File Organization: An Example
– Instead, we consider 10 buckets and a hash function that computes the
sum of the binary representations of the characters of a key, then
returns the sum modulo the number of buckets.
– For branch name ‘Perryridge’
Bucket no=h(Perryridge) = 5
– For branch name ‘Round Hill’
Bucket no=h(Round Hill) = 3
– For branch name ‘Brighton’
Bucket no=h(Brighton) = 3
7/24/2017 45Md. Golam Moazzam, Dept. of CSE, JU
46. Indexing and Hashing
Hash File Organization: An Example
7/24/2017 46Md. Golam Moazzam, Dept. of CSE, JU
47. Indexing and Hashing
Handling of Bucket Overflows
In case of insertion, if the bucket does not have enough space, a bucket
overflow is said to occur. Bucket overflow can occur mainly for two
reasons:
Insufficient buckets. The number of buckets nB must be chosen such
that nB > nr/fr, where nr denotes the total number of records that will be
stored and fr denotes the number of records that will fit in a bucket.
Skew. Some buckets are assigned more records than are others, so a
bucket may overflow even when other buckets still have space. This
situation is called bucket skew. Skew can occur for two reasons:
– Multiple records may have the same search key.
– The chosen hash function may result in non-uniform distribution of
search keys.
7/24/2017 47Md. Golam Moazzam, Dept. of CSE, JU
48. Indexing and Hashing
Handling of Bucket Overflows
Solution:
If a record must be inserted into a bucket b, and b is already full, the
system provides an overflow bucket for b, and inserts the record into
the overflow bucket. If the overflow bucket is also full, the system
provides another overflow bucket, and so on. All the overflow buckets
of a given bucket are chained together in a linked list.
7/24/2017 48Md. Golam Moazzam, Dept. of CSE, JU
49. Indexing and Hashing
Handling of Bucket Overflows
7/24/2017 49Md. Golam Moazzam, Dept. of CSE, JU
50. Indexing and Hashing
Difference between open and closed hashing
Closed Hashing:
Closed hashing always places keys with same hash function values in
same bucket (in overflow buckets also).
If bucket is full, the system inserts records in overflow buckets.
Different buckets can be of different sizes.
Overflow buckets are linked together.
7/24/2017 50Md. Golam Moazzam, Dept. of CSE, JU
51. Indexing and Hashing
Difference between open and closed hashing
Open Hashing:
Open hashing places keys with same hash function values in different
bucket if a bucket is full.
Set of buckets is fixed there is no overflow chain
Deletion is difficult in open hashing.
7/24/2017 51Md. Golam Moazzam, Dept. of CSE, JU
52. Indexing and Hashing
Hash Indices
Hashing can be used not only for file organization, but also for index-
structure creation.
We construct a hash index as follows. We apply a hash function on a
search key to identify a bucket, and store the key and its associated
pointers in the bucket.
7/24/2017 52Md. Golam Moazzam, Dept. of CSE, JU