Biology for Computer Engineers Course Handout.pptx
UNIT V.docx
1. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
UNIT V : SEARCHING, SORTING AND HASHING TECHNIQUES 9
Searching- Linear Search - Binary Search. Sorting - Bubble sort - Selection sort - Insertion sort -
Shell sort. Hashing- Hash Functions – Separate Chaining – Open Addressing – Rehashing –
Extendible Hashing.
Searching is used to find the location where an element is available. There are two types
of search techniques. They are:
● Linear or sequential search
● Binary search
Sorting allows an efficient arrangement of elements within a given data structure. It is a way
in which the elements are organized systematically for some purpose. For example, a
dictionary in which words is arranged in alphabetical order and telephone director in which
the subscriber names are listed in alphabetical order. There are many sorting techniques out
of which we study the following.
• Bubble sort
• Insertion sort
• Selection sort and
• Shell sort
There are two types of sorting techniques:
1.2. Internal sorting
1.3. External sorting
If all the elements to be sorted are present in the main memory then such sorting is called
internal sorting on the other hand, if some of the elements to be sorted are kept on the
secondary storage, it is called external sorting. Here we study only internal sorting
techniques.
Linear Search:
This is the simplest of all searching techniques. In this technique, an ordered or unordered list
will be searched one by one from the beginning until the desired element is found. If the
desired element is found in the list then the search is successful otherwise unsuccessful.
Suppose there are ‗n’ elements organized sequentially on a List. The number of
comparisons required to retrieve an element from the list, purely depends on where the
element is stored in the list. If it is the first element, one comparison will do; if it is second
element two comparisons are necessary and so on. On an average you need [(n+1)/2]
comparison‘s to search an element. If search is not successful, you would need ‘n’
comparisons.
The time complexity of linear search is O(n).
2. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
Algorithm:
Let array a[n] stores n elements. Determine whether element ‗x‘ is present or not.
linsrch(a[n], x)
{
index = 0;
flag = 0;
while (index < n) do
{
if (x == a[index])
{
flag = 1; break;
}
index ++;
}
if(flag == 1)
printf(―Data found at %d position―, index);
else
printf(―data not found‖);
}
Example :
Let us illustrate linear search on the following 9 elements:
Index 0 1 2 3 4 5 6 7 8
Elements -15 -6 0 7 9 23 54 82 101
Searching different elements is as follows:
Searching for x = 7 Search successful, data found at 3
rd
position.
Searching for x = 82 Search successful, data found at 7
th
position.
Searching for x = 42 Search un-successful, data not found.
A non-recursive program for Linear Search:
{ include <stdio.h>
{ include <conio.h>
main()
{
int number[25], n, data, i, flag = 0; clrscr();
printf("n Enter the number of elements: "); scanf("%d", &n);
printf("n Enter the elements:
"); for(i = 0; i < n; i++)
scanf("%d", &number[i]);
printf("n Enter the element to be Searched: "); scanf("%d",
&data);
for( i = 0; i < n; i++)
{
3. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
if(number[i] == data)
{
flag = 1;
break;
}
}
if(flag == 1)
printf("n Data found at location: %d", i+1);
else
}
printf("n Data not found "); }
A Recursive program for linear search:
include <stdio.h>
include <conio.h>
void linear_search(int a[], int data, int position, int n)
{
if(position < n)
{
if(a[position] == data)
printf("n Data Found at %d ", position);
else
linear_search(a, data, position + 1, n);
}
void main()
{
printf("n Data not found");
int a[25], i, n, data;
clrscr();
printf("n Enter the number of elements: ");
scanf("%d", &n);
printf("n Enter the elements:
"); for(i = 0; i < n; i++)
{
scanf("%d", &a[i]);
}
printf("n Enter the element to be seached: ");
scanf("%d", &data);
linear_search(a, data, 0, n);
getch();
}
BINARY SEARCH
If we have ‗n‘ records which have been ordered by keys so that x1 < x2 < … < xn . When we
are given a element ‗x‘, binary search is used to find the corresponding element from the list.
In case ‗x‘ is present, we have to determine a value ‗j‘ such that a[j] = x (successful search).
If ‗x‘ is not in the list then j is to set to zero (un successful search).
4. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
In Binary search we jump into the middle of the file, where we find key a[mid], and compare
‗x‘ with a[mid]. If x = a[mid] then the desired record has been found. If x < a[mid] then ‗x‘
must be in that portion of the file that precedes a[mid]. Similarly, if a[mid] > x, then further
search is only necessary in that part of the file which follows a[mid].
If we use recursive procedure of finding the middle key a[mid] of the un-searched portion of
a file, then every un-successful comparison of ‗x‘ with a[mid] will eliminate roughly half the
un-searched portion from consideration.
Since the array size is roughly halved after each comparison between ‗x‘ and a[mid], and
since an array of length ‗n‘ can be halved only about log2n times before reaching a trivial
length, the worst case complexity of Binary search is about log2n.
Algorithm:
Let array a[n] of elements in increasing order, n ≥ 0, determine whether ‗x‘ is present, and if so,
set j such that x = a[j] else return 0.
binsrch(a[], n, x)
{
low = 1; high = n; while (low < high) do
{
mid = (low + high)/2 if (x < a[mid])
high = mid – 1; else if (x > a[mid])
low = mid + 1; else return mid;
}
return 0;
}
low and high are integer variables such that each time through the loop either ‗x‘ is found or low
is increased by at least one or high is decreased by at least one. Thus we have two sequences of
integers approaching each other and eventually low will become greater than high causing
termination in a finite number of steps if ‗x‘ is not present.
Example 1:
Let us illustrate binary search on the following 12 elements:
Index 1 2 3 4 5 6 7 8 9 10 11 12
Elements 4 7 8 9 16 20 24 38 39 45 54 77
If we are searching for x = 4: (This needs 3 comparisons)
low = 1, high = 12, mid = 13/2 = 6, check 20
low = 1, high = 5, mid = 6/2 = 3, check 8
low = 1, high = 2, mid = 3/2 = 1, check 4, found
If we are searching for x = 7: (This needs 4 comparisons)
low = 1, high = 12, mid = 13/2 = 6, check 20
low = 1, high = 5, mid = 6/2 = 3, check 8
low = 1, high = 2, mid = 3/2 = 1, check 4
low = 2, high = 2, mid = 4/2 = 2, check 7, found
5. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
If we are searching for x = 8: (This needs 2 comparisons)
low = 1, high = 12, mid = 13/2 = 6, check 20
low = 1, high = 5, mid = 6/2 = 3, check 8, found
If we are searching for x = 9: (This needs 3 comparisons)
low = 1, high = 12, mid = 13/2 = 6, check 20
low = 1, high = 5, mid = 6/2 = 3, check 8
low = 4, high = 5, mid = 9/2 = 4, check 9, found
If we are searching for x = 16: (This needs 4 comparisons) low
= 1, high = 12, mid = 13/2 = 6, check 20
low = 1, high = 5, mid = 6/2 = 3, check 8
low = 4, high = 5, mid = 9/2 = 4, check 9
low = 5, high = 5, mid = 10/2 = 5, check 16, found
If we are searching for x = 20: (This needs 1 comparison) low =
1, high = 12, mid = 13/2 = 6, check 20, found
If we are searching for x = 24: (This needs 3 comparisons) low
= 1, high = 12, mid = 13/2 = 6, check 20
low = 7, high = 12, mid = 19/2 = 9, check 39
low = 7, high = 8, mid = 15/2 = 7, check 24, found
If we are searching for x = 38: (This needs 4 comparisons) low
= 1, high = 12, mid = 13/2 = 6, check 20
low = 7, high = 12, mid = 19/2 = 9, check 39
low = 7, high = 8, mid = 15/2 = 7, check 24
low = 8, high = 8, mid = 16/2 = 8, check 38, found
If we are searching for x = 39: (This needs 2 comparisons) low
= 1, high = 12, mid = 13/2 = 6, check 20
low = 7, high = 12, mid = 19/2 = 9, check 39, found
If we are searching for x = 45: (This needs 4 comparisons) low
= 1, high = 12, mid = 13/2 = 6, check 20
low = 7, high = 12, mid = 19/2 = 9, check 39
low = 10, high = 12, mid = 22/2 = 11, check 54
low = 10, high = 10, mid = 20/2 = 10, check 45, found
If we are searching for x = 54: (This needs 3 comparisons) low
= 1, high = 12, mid = 13/2 = 6, check 20
low = 7, high = 12, mid = 19/2 = 9, check 39
low = 10, high = 12, mid = 22/2 = 11, check 54, found
If we are searching for x = 77: (This needs 4 comparisons) low
= 1, high = 12, mid = 13/2 = 6, check 20
low = 7, high = 12, mid = 19/2 = 9, check 39
low = 10, high = 12, mid = 22/2 = 11, check 54
low = 12, high = 12, mid = 24/2 = 12, check 77, found
The number of comparisons necessary by search element: 20
– requires 1 comparison;
8 and 39 – requires 2 comparisons;
4, 9, 24, 54 – requires 3 comparisons and
7, 16, 38, 45, 77 – requires 4 comparisons
6. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
Summing the comparisons, needed to find all twelve items and dividing by 12, yielding
37/12 or approximately 3.08 comparisons per successful search on the average.A non-
recursive program for binary search:
Example:
#include <stdio.h>
int binarySearch(int a[], int beg, int end, int val)
{
int mid;
if(end >= beg)
{ mid = (beg + end)/2;
/* if the item to be searched is present at middle */
if(a[mid] == val)
{
return mid+1;
}
/* if the item to be searched is smaller than middle, then it can only be in left
subarray */
else if(a[mid] < val)
{
return binarySearch(a, mid+1, end, val);
}
/* if the item to be searched is greater than middle, then it can only be in right
subarray */
else
{
return binarySearch(a, beg, mid-1, val);
}
}
return -1;
}
int main() {
int a[] = {11, 14, 25, 30, 40, 41, 52, 57, 70}; // given array
int val = 40; // value to be searched
int n = sizeof(a) / sizeof(a[0]); // size of array
int res = binarySearch(a, 0, n-1, val); // Store result
printf("The elements of the array are - ");
for (int i = 0; i < n; i++)
printf("%d ", a[i]);
printf("nElement to be searched is - %d", val);
if (res == -1)
printf("nElement is not present in the array");
else
printf("nElement is present at %d position of array", res);
return 0;
}
A recursive program for binary search:
# include <stdio.h> #
include <conio.h>
7. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
void bin_search(int a[], int data, int low, int high)
{
int mid ;
if( low <= high)
{
mid = (low + high)/2;
if(a[mid] == data)
printf("n Element found at location: %d ", mid + 1);
else
{
if(data < a[mid])
bin_search(a, data, low, mid-1);else
bin_search(a, data, mid+1, high);
}
else
printf("n Element not found");
}
void main()
{
int a[25], i, n, data;
clrscr();
printf("n Enter the number of elements: ");
scanf("%d", &n);
printf("n Enter the elements in ascending order: "); for(i
= 0; i < n; i++)
scanf("%d", &a[i]);
printf("n Enter the element to be searched: ");
scanf("%d", &data);
bin_search(a, data, 0, n-1);
getch();
}
Bubble Sort:
The bubble sort is easy to understand and program. The basic idea of bubble sort is to pass
through the file sequentially several times. In each pass, we compare each element in the file
with its successor i.e., X[i] with X[i+1] and interchange two element when they are not in
proper order. We will illustrate this sorting technique by taking a specific example. Bubble
sort is also called as exchange sort.
Example:
Consider the array x[n] which is stored in memory as shown below:
X[0] X[1] X[2] X[3] X[4] X[5]
33 44 22 11 66 55
Suppose we want our array to be stored in ascending order. Then we pass through the array
5 times as described below:
Pass 1: (first element is compared with all other elements).
8. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
We compare X[i] and X[i+1] for i = 0, 1, 2, 3, and 4, and interchange X[i] and X[i+1] if
X[i] > X[i+1]. The process is shown below:
X[0] X[1] X[2] X[3] X[4] X[5] Remarks
33 44 22 11 66 55
22 44
11 44
44 66
55 66
33 22 11 44 55 66
The biggest number 66 is moved to (bubbled up) the right most position in the array.
Pass 2: (second element is compared).
We repeat the same process, but this time we don‘t include X[5] into our comparisons. i.e.,
we compare X[i] with X[i+1] for i=0, 1, 2, and 3 and interchange X[i] and X[i+1] if X[i] >
X[i+1]. The process is shown below:
X[0] X[1] X[2] X[3] X[4] Remarks
33
22
22
22
33
11
11
11
33
33
33
44
44
44
44
55
55
55
The second biggest number 55 is moved now to X[4].
Pass 3: (third element is compared).
We repeat the same process, but this time we leave both X[4] and X[5]. By doing this, we
move the third biggest number 44 to X[3].
X[0] X[1] X[2] X[3] Remarks
22 11 33 44
11 22
22 33
33 44
11 22 33 44
Pass 4: (fourth element is compared).
9. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
X[0]X[1] X[2] Remarks
We repeat the process leaving X[3], X[4], and X[5]. By doing this, we move the fourth
biggest number 33 to X[2].
11 22 33
11 22
22 33
Pass 5: (fifth element is compared).
We repeat the process leaving X[2], X[3], X[4], and X[5]. By doing this, we move the fifth
biggest number 22 to X[1]. At this time, we will have the smallest number 11 in X[0]. Thus,
we see that we can sort the array of size 6 in 5 passes.
For an array of size n, we required (n-1) passes.
Program for Bubble Sort:
#include <stdio.h>
#include <conio.h>
void bubblesort(int x[], int n)
{
int i, j, temp;
for (i = 0; i < n; i++)
{
for (j = 0; j < n–i-1 ; j++)
{
if (x[j] > x[j+1])
{
temp = x[j]; x[j]
= x[j+1]; x[j+1]
= temp;
}
}
}
}
main()
{
int i, n, x[25]; clrscr();
printf("n Enter the number of elements: "); scanf("%d", &n);
printf("n Enter Data:");
for(i = 0; i < n ; i++)
scanf("%d", &x[i]);
bubblesort(x, n);
printf ("n Array Elements after sorting: ");
for (i = 0; i < n; i++)
printf ("%5d", x[i]);
}
Selection Sort:
10. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
Selection sort will not require no more than n-1 interchanges. Suppose x is an array of size n
stored in memory. The selection sort algorithm first selects the smallest element in the array
x and place it at array position 0; then it selects the next smallest element in the array x and
place it at array position 1. It simply continues this procedure until it places the biggest
element in the last position of the array.
The array is passed through (n-1) times and the smallest element is placed in its respective
position in the array as detailed below:
Pass 1: Find the location j of the smallest element in the array x [0], x[1], x[n-1],
and then interchange x[j] with x[0]. Then x[0] is sorted.
Pass 2: Leave the first element and find the location j of the smallest element in the sub-array
x[1], x[2], . . . . x[n-1], and then interchange x[1] with x[j]. Then x[0], x[1] are sorted.
Pass 3: Leave the first two elements and find the location j of the smallest element in the sub-
array x[2], x[3], . . . . x[n-1], and then interchange x[2] with x[j]. Then x[0], x[1],
x[2] are sorted.
Pass (n-1): Find the location j of the smaller of the elements x[n-2] and x[n-1], and then
interchange x[j] and x[n-2]. Then x[0], x[1], . . . . x[n-2] are sorted. Of course, during
this pass x[n-1] will be the biggest element and so the entire array is sorted.
Example:
Let us consider the following example with 9 elements to analyze selection Sort:
1 2 3 4 5 6 7 8 9 Remarks
65 70 75 80 50 60 55 85 45 find the first smallest element
i j swap a[i] & a[j]
45 70 75 80 50 60 55 85 65 find the second smallest element
i j swap a[i] and a[j]
45 50 75 80 70 60 55 85 65 Find the third smallest element
i j swap a[i] and a[j]
45 50 55 80 70 60 75 85 65 Find the fourth smallest element
i j swap a[i] and a[j]
45 50 55 60 70 80 75 85 65 Find the fifth smallest element
i j swap a[i] and a[j]
45 50 55 60 65 80 75 85 70 Find the sixth smallest element
i j swap a[i] and a[j]
45 50 55 60 65 70 75 85 80 Find the seventh smallest
element
i j swap a[i] and a[j]
45 50 55 60 65 70 75 85 80 Find the eighth smallest element
i J swap a[i] and a[j]
45 50 55 60 65 70 75 80 85 The outer loop ends.
11. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
Non-recursive Program for selection sort:
# include<stdio.h> #
include<conio.h>
void selectionSort( int low, int high );
int a[25];
int main()
{
int num, i= 0;
clrscr();
printf( "Enter the number of elements: " );
scanf("%d", &num);
printf( "nEnter the elements:n" );
for(i=0; i < num; i++)
scanf( "%d", &a[i] );
selectionSort( 0, num - 1 );
printf( "nThe elements after sorting are: " ); for(
i=0; i< num; i++ )
printf( "%d ", a[i] );
return 0;
}
void selectionSort( int low, int high )
{
int i=0, j=0, temp=0, minindex;
for( i=low; i <= high; i++ )
{
minindex = i;
for( j=i+1; j <= high; j++ )
{
if( a[j] < a[minindex] )
minindex = j;
}
temp = a[i];
a[i] = a[minindex];
a[minindex] = temp;
}
}
Insertion sort algorithm
Insertion sort algorithm picks elements one by one and places it to the right position where it
belongs in the sorted list of elements. In the following C program we have implemented the same
logic.
Before going through the program, lets see the steps of insertion sort with the help of an example.
Input elements: 89 17 8 12 0
12. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
Step 1: 89 17 8 12 0 (the bold elements are sorted list and non-bold unsorted list)
Step 2: 17 89 8 12 0 (each element will be removed from unsorted list and placed at the right
position in the sorted list)
Step 3: 8 17 89 12 0
Step 4: 8 12 17 89 0
Step 5: 0 8 12 17 89
Algorithm
SELECTION SORT(ARR, N)
o Step 1: Repeat Steps 2 and 3 for K = 1 to N-1
o Step 2: CALL SMALLEST(ARR, K, N, POS)
o Step 3: SWAP A[K] with ARR[POS]
[END OF LOOP]
o Step 4: EXIT
SMALLEST (ARR, K, N, POS)
o Step 1: [INITIALIZE] SET SMALL = ARR[K]
o Step 2: [INITIALIZE] SET POS = K
o Step 3: Repeat for J = K+1 to N -1
IF SMALL > ARR[J]
SET SMALL = ARR[J]
SET POS = J
[END OF IF]
[END OF LOOP]
o Step 4: RETURN POS
Program:
#include<stdio.h>
int smallest(int[],int,int);
void main ()
{
int a[10] = {10, 9, 7, 101, 23, 44, 12, 78, 34, 23};
int i,j,k,pos,temp;
for(i=0;i<10;i++)
{
pos = smallest(a,10,i);
temp = a[i];
a[i]=a[pos];
a[pos] = temp;
}
printf("nprinting sorted elements...n");
for(i=0;i<10;i++)
{
printf("%dn",a[i]);
}
}
int smallest(int a[], int n, int i)
{
int small,pos,j;
13. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
small = a[i];
pos = i;
for(j=i+1;j<10;j++)
{
if(a[j]<small)
{
small = a[j];
pos=j;
}
}
return pos;
}
Shell Sort
Shell sort is the generalization of insertion sort which overcomes the drawbacks of insertion sort
by comparing elements separated by a gap of several positions. In general, Shell sort performs the
following steps.
Step 1: Arrange the elements in the tabular form and sort the columns by using insertion sort.
Step 2: Repeat Step 1; each time with smaller number of longer columns in such a way that at the
end, there is only one column of data to be sorted.
Algorithm
Shell_Sort(Arr, n)
Step 1: SET FLAG = 1, GAP_SIZE = N
Step 2: Repeat Steps 3 to 6 while FLAG = 1 OR GAP_SIZE > 1
Step 3:SET FLAG = 0
Step 4:SET GAP_SIZE = (GAP_SIZE + 1) / 2
Step 5:Repeat Step 6 for I = 0 to I < (N -GAP_SIZE)
Step 6:IF Arr[I + GAP_SIZE] > Arr[I]
SWAP Arr[I + GAP_SIZE], Arr[I]
SET FLAG = 0
Step 7: END
Program
#include <stdio.h>
void shellsort(int arr[], int num)
{
int i, j, k, tmp;
for (i = num / 2; i > 0; i = i / 2)
{
for (j = i; j < num; j++)
14. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
{
for(k = j - i; k >= 0; k = k - i)
{
if (arr[k+i] >= arr[k])
break;
else
{
tmp = arr[k];
arr[k] = arr[k+i];
arr[k+i] = tmp;
} } } } }
int main()
{
int arr[30];
int k, num;
printf("Enter total no. of elements : ");
scanf("%d", &num);
printf("nEnter %d numbers: ", num);
for (k = 0 ; k < num; k++)
{
scanf("%d", &arr[k]);
}
shellsort(arr, num);
printf("n Sorted array is: ");
for (k = 0; k < num; k++)
printf("%d ", arr[k]);
return 0;
}
15. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
Hashing
The effective technique in which insertion, deletion and search will be done in constant
time. This technique is called as Hashing.
Hash Function
Each key is mapped into some number in the range 0 to Tablesize - 1 and placed in the
appropriate cell. The mapping is called a hash function.
An ideal hash table
In this example, john hashes to 3, phil hashes to 4, dave hashes to 6, and mary hashes to 7.
Routine for Hash Function
If the input keys are integers, then hash function will be key mod Tablesize. Usually,
the keys are strings; in this case, the hash function needs to be chosen carefully.
One option is to add up the ASCII values of the characters in the string.
16. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
typedef int INDEX;
A simple hash function
INDEX hash( char *key, int tablesize )
{
int hash_val = 0;
while( *key != '0' )
hash_val += *key++;
return( hash_val % H_SIZE );
}
Another hash function is that, key has at least two characters plus the NULL terminator.
27 represents the number of letters in the English alphabet, plus the blank, and 729 is 272
.
INDEX hash( char *key, int tablesize )
{
return ( ( key[0] + 27*key[1] + 729*key[2] ) % tablesize );
}
A good hash function
INDEX hash( char *key, int tablesize )
{
int hash_val = O;
while( *key != '0' )
hash_val = ( hash_val << 5 ) + *key++;
return( hash_val % H_SIZE );
}
The main problems deal with choosing a function,
17. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
Collision
what to do when two keys hash to the same value
● deciding on the table size.
When two different keys hash to same position in the hashtable, overwriting of the key
values in the hashtable. This is known as Collision.
Collision resolution
If, when inserting an element, it hashes to the same value as an already inserted element,
then we have a collision and need to resolve it.
Methods to resolve collision
1. Separate Chaining / Open Hashing
2. open addressing / Closed Hashing
a. linear Probing
b. Quadratic Probing
c. Double hashing
3. Extensible Hashing
4. Rehashing
1. Separate Chaining / Open Hashing
The first strategy, commonly known as either open hashing, or separate chaining, is to
keep a list of all elements that hash to the same value. For convenience, our lists have headers.
hash(x) = x mod 10. (The table size is 10)
To perform an insert, we traverse down the appropriate list to check whether the element
is already in place. If the element turns out to be new, it is inserted either at the front of the list
or at the end of the list. New elements are inserted at the front of the list.
18. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
The hash table structure contains the actual size and an array of linked lists, which
are dynamically allocated when the table is initialized.
Type declaration
typedef struct listnode *node_ptr;
struct listnode
{
elementtype element;
position next;
};
typedef node_ptr LIST;
typedef node_ptr position;
struct hashtbl
{
int tablesize;
LIST *thelists;
19. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
};
Initialization routine for open hash table
HASHTABLE initializetable(int tablesize )
{
HASHTABLE H;
int i;
if( table size < MIN_TABLE_SIZE )
{
error("Table size too small");
return NULL;
}
H = (HASH_TABLE) malloc ( sizeof (struct hashtbl) );
if( H == NULL )
fatalerror("Out of space!!!");
H->tablesize = nextprime( tablesize );
H->thelists = malloc( sizeof (LIST) * H->tablesize );
if( H->thelists == NULL )
fatalerror("Out of space!!!");
for(i=0; i<H->tablesize; i++ )
{
H->thelists[i] = malloc( sizeof (struct listnode) );
if( H->thelists[i] == NULL )
fatalerror("Out of space!!!");
20. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
else
H->thelists[i]->next = NULL;
}
return H;
}
Routine for Find operation
Position find( elementtype key, HASHTABLE H )
{
position p;
LIST L;
L = H->thelists[ hash( key, H->tablesize) ];
p = L->next;
while( (p != NULL) && (p->element != key)
) p = p->next;
return p;
}
Routine For Insert Operation
Void insert( elementtype key, HASHTABLE H )
{
position pos, newcell; LIST L;
pos = find( key, H );
21. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
if( pos == NULL )
{
newcell = (position) malloc(sizeof(struct listnode));
if( newcell == NULL )
fatalerror("Out of space!!!");
else
{
L = H->thelists[ hash( key, H->table size )
]; newcell->next = L->next;
newcell->element =
key; L->next = newcell;
} } }
Closed Hashing (Open Addressing)
Separate chaining has the disadvantage of requiring pointers. This tends to slow the
algorithm down a bit because of the time required to allocate new cells, and also essentially
requires the implementation of a second data structure.
Closed hashing, also known as open addressing, is an alternative to resolving collisions
with linked lists.
In a closed hashing system, if a collision occurs, alternate cells are tried until an empty
cell is found. More formally, cells h0(x), h1(x), h2(x), . . . are tried in succession where hi(x) =
(hash(x) + F(i) mod tablesize), with F(0) = 0. The function, F , is the collision resolution strategy.
Because all the data goes inside the table, a bigger table is needed for closed hashing than for
open hashing. Generally, the load factor should be below = 0.5 for closed hashing.
22. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
Three common collision resolution strategies are
1. Linear Probing
2. Quadratic Probing
3. Double Hashing
Linear Probing
In linear probing, F is a linear function of i, typically F(i) = i. This amounts to trying cells
sequentially (with wraparound) in search of an empty cell.
The below Figure shows the result of inserting keys {89, 18, 49, 58, 69} into a closed table
using the same hash function as before and the collision resolution strategy, F (i) = i. The first
collision occurs when 49 is inserted; it is put in the next available spot, namely 0, which is open.
58 collides with 18, 89, and then 49 before an empty cell is found three away. The collision for
69 is handled in a similar manner. As long as the table is big enough, a free cell can always be
found, but the time to do so can get quite large. Worse, even if the table is relatively empty,
blocks of occupied cells start forming. This effect, known as primary clustering, means that any
key that hashes into the cluster will require several attempts to resolve the collision, and then
it will add to the cluster.
Although we will not perform the calculations here, it can be shown that the expected number
of probes using linear probing is roughly 1/2(1 + 1/(1 - )2) for insertions and unsuccessful
searches and 1/2(1 + 1/ (1- )) for successful searches. These assumptions are satisfied by a
random collision resolution strategy and are reasonable unless is very close to 1.
23. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
Quadratic Probing
Quadratic probing is a collision resolution method that eliminates the primary clustering
problem of linear probing. Quadratic probing is what you would expect-the collision function is
quadratic. The popular choice is F(i) = i2
. the below figure shows the resulting closed table with
this collision function on the same input used in the linear probing example.
When 49 collide with 89, the next position attempted is one cell away. This cell is empty,
so 49 is placed there. Next 58 collides at position 8. Then the cell one away is tried but another
collision occurs. A vacant cell is found at the next cell tried, which is 22
= 4 away. 58 is thus
placed in cell 2. The same thing happens for 69.
24. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
For linear probing it is a bad idea to let the hash table get nearly full, because performance
degrades. For quadratic probing, the situation is even more drastic: There is no guarantee of
finding an empty cell once the table gets more than half full, or even before the table gets half
full if the table size is not prime. This is because at most half of the table can be used as alternate
locations to resolve collisions.
Type declaration for open addressing hash tables
enum kind_of_entry { legitimate, empty, deleted };
struct hash_entry
{
element_type element;
enum kind_of_entry info;
};
/* the_cells is an array of hash_entry cells, allocated later */
struct hash_tbl
{
unsigned int table_size;
cell *the_cells;
}
;
Routine to initialize closed hash table
Hashtable initialize_table( unsigned int table_size )
{
hashtable H;
25. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
int i;
if( table_size < MIN_TABLE_SIZE )
{
error("Table size too small");
return NULL;
}
H = (hashtable) malloc( sizeof ( struct hash_tbl ) );
if( H == NULL )
fatal_error("Out of space!!!");
H->table_size = next_prime( table_size );
H->the cells = (cell *) malloc ( sizeof ( cell ) * H->table_size );
if( H->the_cells == NULL )
fatal_error("Out of space!!!");
for(i=0; i<H->table_size; i++
) H->the_cells[i].info =
empty; return H;
}
Find routine for closed hashing with quadratic probing
Position find( element_type key, hashtable H )
{
position i, current_pos;
i = 0;
current_pos = hash( key, H->table_size );
while( (H->the_cells[current_pos].element != key ) && (H-> the_cells[current_pos]
.info != empty ) )
{
26. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
current_pos += 2*(++i) - 1;
if( current_pos >= H->table_size )
current_pos -= H->table_size;
}
return current_pos;
}
Insert routine for closed hash tables with quadratic probing
Void insert( element_type key, hashtable H )
{
position pos;
pos = find( key, H );
if( H->the_cells[pos].info != legitimate )
{ /* ok to insert here */
H->the_cells[pos].info = legitimate;
H->the_cells[pos].element = key;
}
}
Double Hashing
The last collision resolution method we will examine is double hashing. For double hashing, one
popular choice is f(i) = i h2 (x). This formula says that we apply a second hash function to x and
probe at a distance h2 (x), 2 h2 (x), . . ., and so on. A function such as h2 (x) = R - (x mod R), with
R a prime smaller than H_SIZE, will work well. If we choose R = 7, then below Figure shows
the results of inserting the same keys as before.
27. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
Closed hash table with double hashing, after each insertion.
The first collision occurs when 49 is inserted. h2 (49) = 7 - 0 = 7, so 49 is inserted in
position 6. h2 (58) = 7 - 2 = 5, so 58 is inserted at location 3. Finally, 69 collides and is inserted
at a distance h2 (69) = 7 - 6 = 1 away. If we tried to insert 60 in position 0, we would have a
collision. Since h2 (60) = 7 - 4 = 3, we would then try positions 3, 6, 9, and then 2 until an empty
spot is found.
Rehashing
If the table gets too full, the running time for the operations will start taking too long and
inserts might fail for closed hashing with quadratic resolution. This can happen if there are too
many deletions intermixed with insertions.
A solution, then, is to build another table that is about twice as big and scan down the
entire original hash table, computing the new hash value for element and inserting it in the new
table.
As an example, suppose the elements 13, 15, 24, and 6 are inserted into a closed hash
table of size 7. The hash function is h(x) = x mod 7. Suppose linear probing is used to resolve
collisions.
28. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
The resulting hash table appears in below figure
If 23 is inserted into the table, the resulting table in below figure will be over 70 percent
full. Because the table is so full, a new table is created.
The size of this table is 17, because this is the first prime which is twice as large as the
old table size. The new hash function is then h(x) = x mod 17.
The old table is scanned, and elements 6, 15, 23, 24, and 13 are inserted into the new
table. The resulting table appears as below.
29. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
This entire operation is called rehashing. This is obviously a very expensive operation –
the running time is O(n).
Rehashing routines
Hashtable rehash( HASH_TABLE H )
{
unsigned int i, old_size;
cell *old_cells;
old_cells = H->the_cells;
old_size = H->table_size;
/* Get a new, empty table */
H = initialize_table( 2*old_size );
/* Scan through old table, reinserting into new */
for( i=0; i<old_size; i++ )
if( old_cells[i].info == legitimate )
insert( old_cells[i].element, H );
free( old_cells );
return H;
}
Extendible Hashing
If the amount of data is too large to fit in main memory, then is the number of disk accesses
required to retrieve data. As before, we assume that at any point we have n records to store; the
value of n changes over time. Furthermore, at most m records fit in one disk block. We will use
m = 4 in this section.
30. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
If either open hashing or closed hashing is used, the major problem is that collisions
could cause several blocks to be examined during a find, even for a well-distributed hash table.
Furthermore, when the table gets too full, an extremely expensive rehashing step must
be performed, which requires O(n) disk accesses.
A clever alternative, known as extendible hashing, allows a find to be performed in two
disk accesses. Insertions also require few disk accesses.
If the time to perform this step could be reduced, then we would have a practical scheme. This
is exactly the strategy used by extendible hashing.
Let us suppose, for the moment, that our data consists of several six-bit integers. The root
of the "tree" contains four pointers determined by the leading two bits of the data. Each leaf has
up to m = 4 elements.
It happens that in each leaf the first two bits are identical; this is indicated by the number
in parentheses.
To be more formal, D will represent the number of bits used by the root, which is
sometimes known as the directory. The number of entries in the directory is thus 2D
. dL is the
number of leading bits that all the elements of some leaf have in common. dL will depend on the
particular leaf, and dL<=D.
Extendible hashing: original data
31. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C
UNIT-V
Suppose that we want to insert the key 100100. This would go into the third leaf, but
as the third leaf is already full, there is no room. We thus split this leaf into two
leaves, which are now determined by the first three bits. This requires increasing the
directory size to 3.
If the key 000000 is now inserted, then the first leaf is split, generating two leaves
with dL = 3. Since D = 3, the only change required in the directory is the updating of
the 000 and 001 pointers.
***************************************ALL THE BEST*************************************