SlideShare a Scribd company logo
1 of 31
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
UNIT V : SEARCHING, SORTING AND HASHING TECHNIQUES 9
Searching- Linear Search - Binary Search. Sorting - Bubble sort - Selection sort - Insertion sort -
Shell sort. Hashing- Hash Functions – Separate Chaining – Open Addressing – Rehashing –
Extendible Hashing.
Searching is used to find the location where an element is available. There are two types
of search techniques. They are:
● Linear or sequential search
● Binary search
Sorting allows an efficient arrangement of elements within a given data structure. It is a way
in which the elements are organized systematically for some purpose. For example, a
dictionary in which words is arranged in alphabetical order and telephone director in which
the subscriber names are listed in alphabetical order. There are many sorting techniques out
of which we study the following.
• Bubble sort
• Insertion sort
• Selection sort and
• Shell sort
There are two types of sorting techniques:
1.2. Internal sorting
1.3. External sorting
If all the elements to be sorted are present in the main memory then such sorting is called
internal sorting on the other hand, if some of the elements to be sorted are kept on the
secondary storage, it is called external sorting. Here we study only internal sorting
techniques.
Linear Search:
This is the simplest of all searching techniques. In this technique, an ordered or unordered list
will be searched one by one from the beginning until the desired element is found. If the
desired element is found in the list then the search is successful otherwise unsuccessful.
Suppose there are ‗n’ elements organized sequentially on a List. The number of
comparisons required to retrieve an element from the list, purely depends on where the
element is stored in the list. If it is the first element, one comparison will do; if it is second
element two comparisons are necessary and so on. On an average you need [(n+1)/2]
comparison‘s to search an element. If search is not successful, you would need ‘n’
comparisons.
The time complexity of linear search is O(n).
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
Algorithm:
Let array a[n] stores n elements. Determine whether element ‗x‘ is present or not.
linsrch(a[n], x)
{
index = 0;
flag = 0;
while (index < n) do
{
if (x == a[index])
{
flag = 1; break;
}
index ++;
}
if(flag == 1)
printf(―Data found at %d position―, index);
else
printf(―data not found‖);
}
Example :
Let us illustrate linear search on the following 9 elements:
Index 0 1 2 3 4 5 6 7 8
Elements -15 -6 0 7 9 23 54 82 101
Searching different elements is as follows:
Searching for x = 7 Search successful, data found at 3
rd
position.
Searching for x = 82 Search successful, data found at 7
th
position.
Searching for x = 42 Search un-successful, data not found.
A non-recursive program for Linear Search:
{ include <stdio.h>
{ include <conio.h>
main()
{
int number[25], n, data, i, flag = 0; clrscr();
printf("n Enter the number of elements: "); scanf("%d", &n);
printf("n Enter the elements:
"); for(i = 0; i < n; i++)
scanf("%d", &number[i]);
printf("n Enter the element to be Searched: "); scanf("%d",
&data);
for( i = 0; i < n; i++)
{
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
if(number[i] == data)
{
flag = 1;
break;
}
}
if(flag == 1)
printf("n Data found at location: %d", i+1);
else
}
printf("n Data not found "); }
A Recursive program for linear search:
include <stdio.h>
include <conio.h>
void linear_search(int a[], int data, int position, int n)
{
if(position < n)
{
if(a[position] == data)
printf("n Data Found at %d ", position);
else
linear_search(a, data, position + 1, n);
}
void main()
{
printf("n Data not found");
int a[25], i, n, data;
clrscr();
printf("n Enter the number of elements: ");
scanf("%d", &n);
printf("n Enter the elements:
"); for(i = 0; i < n; i++)
{
scanf("%d", &a[i]);
}
printf("n Enter the element to be seached: ");
scanf("%d", &data);
linear_search(a, data, 0, n);
getch();
}
BINARY SEARCH
If we have ‗n‘ records which have been ordered by keys so that x1 < x2 < … < xn . When we
are given a element ‗x‘, binary search is used to find the corresponding element from the list.
In case ‗x‘ is present, we have to determine a value ‗j‘ such that a[j] = x (successful search).
If ‗x‘ is not in the list then j is to set to zero (un successful search).
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
In Binary search we jump into the middle of the file, where we find key a[mid], and compare
‗x‘ with a[mid]. If x = a[mid] then the desired record has been found. If x < a[mid] then ‗x‘
must be in that portion of the file that precedes a[mid]. Similarly, if a[mid] > x, then further
search is only necessary in that part of the file which follows a[mid].
If we use recursive procedure of finding the middle key a[mid] of the un-searched portion of
a file, then every un-successful comparison of ‗x‘ with a[mid] will eliminate roughly half the
un-searched portion from consideration.
Since the array size is roughly halved after each comparison between ‗x‘ and a[mid], and
since an array of length ‗n‘ can be halved only about log2n times before reaching a trivial
length, the worst case complexity of Binary search is about log2n.
Algorithm:
Let array a[n] of elements in increasing order, n ≥ 0, determine whether ‗x‘ is present, and if so,
set j such that x = a[j] else return 0.
binsrch(a[], n, x)
{
low = 1; high = n; while (low < high) do
{
mid = (low + high)/2 if (x < a[mid])
high = mid – 1; else if (x > a[mid])
low = mid + 1; else return mid;
}
return 0;
}
low and high are integer variables such that each time through the loop either ‗x‘ is found or low
is increased by at least one or high is decreased by at least one. Thus we have two sequences of
integers approaching each other and eventually low will become greater than high causing
termination in a finite number of steps if ‗x‘ is not present.
Example 1:
Let us illustrate binary search on the following 12 elements:
Index 1 2 3 4 5 6 7 8 9 10 11 12
Elements 4 7 8 9 16 20 24 38 39 45 54 77
If we are searching for x = 4: (This needs 3 comparisons)
low = 1, high = 12, mid = 13/2 = 6, check 20
low = 1, high = 5, mid = 6/2 = 3, check 8
low = 1, high = 2, mid = 3/2 = 1, check 4, found
If we are searching for x = 7: (This needs 4 comparisons)
low = 1, high = 12, mid = 13/2 = 6, check 20
low = 1, high = 5, mid = 6/2 = 3, check 8
low = 1, high = 2, mid = 3/2 = 1, check 4
low = 2, high = 2, mid = 4/2 = 2, check 7, found
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
If we are searching for x = 8: (This needs 2 comparisons)
low = 1, high = 12, mid = 13/2 = 6, check 20
low = 1, high = 5, mid = 6/2 = 3, check 8, found
If we are searching for x = 9: (This needs 3 comparisons)
low = 1, high = 12, mid = 13/2 = 6, check 20
low = 1, high = 5, mid = 6/2 = 3, check 8
low = 4, high = 5, mid = 9/2 = 4, check 9, found
If we are searching for x = 16: (This needs 4 comparisons) low
= 1, high = 12, mid = 13/2 = 6, check 20
low = 1, high = 5, mid = 6/2 = 3, check 8
low = 4, high = 5, mid = 9/2 = 4, check 9
low = 5, high = 5, mid = 10/2 = 5, check 16, found
If we are searching for x = 20: (This needs 1 comparison) low =
1, high = 12, mid = 13/2 = 6, check 20, found
If we are searching for x = 24: (This needs 3 comparisons) low
= 1, high = 12, mid = 13/2 = 6, check 20
low = 7, high = 12, mid = 19/2 = 9, check 39
low = 7, high = 8, mid = 15/2 = 7, check 24, found
If we are searching for x = 38: (This needs 4 comparisons) low
= 1, high = 12, mid = 13/2 = 6, check 20
low = 7, high = 12, mid = 19/2 = 9, check 39
low = 7, high = 8, mid = 15/2 = 7, check 24
low = 8, high = 8, mid = 16/2 = 8, check 38, found
If we are searching for x = 39: (This needs 2 comparisons) low
= 1, high = 12, mid = 13/2 = 6, check 20
low = 7, high = 12, mid = 19/2 = 9, check 39, found
If we are searching for x = 45: (This needs 4 comparisons) low
= 1, high = 12, mid = 13/2 = 6, check 20
low = 7, high = 12, mid = 19/2 = 9, check 39
low = 10, high = 12, mid = 22/2 = 11, check 54
low = 10, high = 10, mid = 20/2 = 10, check 45, found
If we are searching for x = 54: (This needs 3 comparisons) low
= 1, high = 12, mid = 13/2 = 6, check 20
low = 7, high = 12, mid = 19/2 = 9, check 39
low = 10, high = 12, mid = 22/2 = 11, check 54, found
If we are searching for x = 77: (This needs 4 comparisons) low
= 1, high = 12, mid = 13/2 = 6, check 20
low = 7, high = 12, mid = 19/2 = 9, check 39
low = 10, high = 12, mid = 22/2 = 11, check 54
low = 12, high = 12, mid = 24/2 = 12, check 77, found
The number of comparisons necessary by search element: 20
– requires 1 comparison;
8 and 39 – requires 2 comparisons;
4, 9, 24, 54 – requires 3 comparisons and
7, 16, 38, 45, 77 – requires 4 comparisons
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
Summing the comparisons, needed to find all twelve items and dividing by 12, yielding
37/12 or approximately 3.08 comparisons per successful search on the average.A non-
recursive program for binary search:
Example:
#include <stdio.h>
int binarySearch(int a[], int beg, int end, int val)
{
int mid;
if(end >= beg)
{ mid = (beg + end)/2;
/* if the item to be searched is present at middle */
if(a[mid] == val)
{
return mid+1;
}
/* if the item to be searched is smaller than middle, then it can only be in left
subarray */
else if(a[mid] < val)
{
return binarySearch(a, mid+1, end, val);
}
/* if the item to be searched is greater than middle, then it can only be in right
subarray */
else
{
return binarySearch(a, beg, mid-1, val);
}
}
return -1;
}
int main() {
int a[] = {11, 14, 25, 30, 40, 41, 52, 57, 70}; // given array
int val = 40; // value to be searched
int n = sizeof(a) / sizeof(a[0]); // size of array
int res = binarySearch(a, 0, n-1, val); // Store result
printf("The elements of the array are - ");
for (int i = 0; i < n; i++)
printf("%d ", a[i]);
printf("nElement to be searched is - %d", val);
if (res == -1)
printf("nElement is not present in the array");
else
printf("nElement is present at %d position of array", res);
return 0;
}
A recursive program for binary search:
# include <stdio.h> #
include <conio.h>
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
void bin_search(int a[], int data, int low, int high)
{
int mid ;
if( low <= high)
{
mid = (low + high)/2;
if(a[mid] == data)
printf("n Element found at location: %d ", mid + 1);
else
{
if(data < a[mid])
bin_search(a, data, low, mid-1);else
bin_search(a, data, mid+1, high);
}
else
printf("n Element not found");
}
void main()
{
int a[25], i, n, data;
clrscr();
printf("n Enter the number of elements: ");
scanf("%d", &n);
printf("n Enter the elements in ascending order: "); for(i
= 0; i < n; i++)
scanf("%d", &a[i]);
printf("n Enter the element to be searched: ");
scanf("%d", &data);
bin_search(a, data, 0, n-1);
getch();
}
Bubble Sort:
The bubble sort is easy to understand and program. The basic idea of bubble sort is to pass
through the file sequentially several times. In each pass, we compare each element in the file
with its successor i.e., X[i] with X[i+1] and interchange two element when they are not in
proper order. We will illustrate this sorting technique by taking a specific example. Bubble
sort is also called as exchange sort.
Example:
Consider the array x[n] which is stored in memory as shown below:
X[0] X[1] X[2] X[3] X[4] X[5]
33 44 22 11 66 55
Suppose we want our array to be stored in ascending order. Then we pass through the array
5 times as described below:
Pass 1: (first element is compared with all other elements).
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
We compare X[i] and X[i+1] for i = 0, 1, 2, 3, and 4, and interchange X[i] and X[i+1] if
X[i] > X[i+1]. The process is shown below:
X[0] X[1] X[2] X[3] X[4] X[5] Remarks
33 44 22 11 66 55
22 44
11 44
44 66
55 66
33 22 11 44 55 66
The biggest number 66 is moved to (bubbled up) the right most position in the array.
Pass 2: (second element is compared).
We repeat the same process, but this time we don‘t include X[5] into our comparisons. i.e.,
we compare X[i] with X[i+1] for i=0, 1, 2, and 3 and interchange X[i] and X[i+1] if X[i] >
X[i+1]. The process is shown below:
X[0] X[1] X[2] X[3] X[4] Remarks
33
22
22
22
33
11
11
11
33
33
33
44
44
44
44
55
55
55
The second biggest number 55 is moved now to X[4].
Pass 3: (third element is compared).
We repeat the same process, but this time we leave both X[4] and X[5]. By doing this, we
move the third biggest number 44 to X[3].
X[0] X[1] X[2] X[3] Remarks
22 11 33 44
11 22
22 33
33 44
11 22 33 44
Pass 4: (fourth element is compared).
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
X[0]X[1] X[2] Remarks
We repeat the process leaving X[3], X[4], and X[5]. By doing this, we move the fourth
biggest number 33 to X[2].
11 22 33
11 22
22 33
Pass 5: (fifth element is compared).
We repeat the process leaving X[2], X[3], X[4], and X[5]. By doing this, we move the fifth
biggest number 22 to X[1]. At this time, we will have the smallest number 11 in X[0]. Thus,
we see that we can sort the array of size 6 in 5 passes.
For an array of size n, we required (n-1) passes.
Program for Bubble Sort:
#include <stdio.h>
#include <conio.h>
void bubblesort(int x[], int n)
{
int i, j, temp;
for (i = 0; i < n; i++)
{
for (j = 0; j < n–i-1 ; j++)
{
if (x[j] > x[j+1])
{
temp = x[j]; x[j]
= x[j+1]; x[j+1]
= temp;
}
}
}
}
main()
{
int i, n, x[25]; clrscr();
printf("n Enter the number of elements: "); scanf("%d", &n);
printf("n Enter Data:");
for(i = 0; i < n ; i++)
scanf("%d", &x[i]);
bubblesort(x, n);
printf ("n Array Elements after sorting: ");
for (i = 0; i < n; i++)
printf ("%5d", x[i]);
}
Selection Sort:
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
Selection sort will not require no more than n-1 interchanges. Suppose x is an array of size n
stored in memory. The selection sort algorithm first selects the smallest element in the array
x and place it at array position 0; then it selects the next smallest element in the array x and
place it at array position 1. It simply continues this procedure until it places the biggest
element in the last position of the array.
The array is passed through (n-1) times and the smallest element is placed in its respective
position in the array as detailed below:
Pass 1: Find the location j of the smallest element in the array x [0], x[1], x[n-1],
and then interchange x[j] with x[0]. Then x[0] is sorted.
Pass 2: Leave the first element and find the location j of the smallest element in the sub-array
x[1], x[2], . . . . x[n-1], and then interchange x[1] with x[j]. Then x[0], x[1] are sorted.
Pass 3: Leave the first two elements and find the location j of the smallest element in the sub-
array x[2], x[3], . . . . x[n-1], and then interchange x[2] with x[j]. Then x[0], x[1],
x[2] are sorted.
Pass (n-1): Find the location j of the smaller of the elements x[n-2] and x[n-1], and then
interchange x[j] and x[n-2]. Then x[0], x[1], . . . . x[n-2] are sorted. Of course, during
this pass x[n-1] will be the biggest element and so the entire array is sorted.
Example:
Let us consider the following example with 9 elements to analyze selection Sort:
1 2 3 4 5 6 7 8 9 Remarks
65 70 75 80 50 60 55 85 45 find the first smallest element
i j swap a[i] & a[j]
45 70 75 80 50 60 55 85 65 find the second smallest element
i j swap a[i] and a[j]
45 50 75 80 70 60 55 85 65 Find the third smallest element
i j swap a[i] and a[j]
45 50 55 80 70 60 75 85 65 Find the fourth smallest element
i j swap a[i] and a[j]
45 50 55 60 70 80 75 85 65 Find the fifth smallest element
i j swap a[i] and a[j]
45 50 55 60 65 80 75 85 70 Find the sixth smallest element
i j swap a[i] and a[j]
45 50 55 60 65 70 75 85 80 Find the seventh smallest
element
i j swap a[i] and a[j]
45 50 55 60 65 70 75 85 80 Find the eighth smallest element
i J swap a[i] and a[j]
45 50 55 60 65 70 75 80 85 The outer loop ends.
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
Non-recursive Program for selection sort:
# include<stdio.h> #
include<conio.h>
void selectionSort( int low, int high );
int a[25];
int main()
{
int num, i= 0;
clrscr();
printf( "Enter the number of elements: " );
scanf("%d", &num);
printf( "nEnter the elements:n" );
for(i=0; i < num; i++)
scanf( "%d", &a[i] );
selectionSort( 0, num - 1 );
printf( "nThe elements after sorting are: " ); for(
i=0; i< num; i++ )
printf( "%d ", a[i] );
return 0;
}
void selectionSort( int low, int high )
{
int i=0, j=0, temp=0, minindex;
for( i=low; i <= high; i++ )
{
minindex = i;
for( j=i+1; j <= high; j++ )
{
if( a[j] < a[minindex] )
minindex = j;
}
temp = a[i];
a[i] = a[minindex];
a[minindex] = temp;
}
}
Insertion sort algorithm
Insertion sort algorithm picks elements one by one and places it to the right position where it
belongs in the sorted list of elements. In the following C program we have implemented the same
logic.
Before going through the program, lets see the steps of insertion sort with the help of an example.
Input elements: 89 17 8 12 0
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
Step 1: 89 17 8 12 0 (the bold elements are sorted list and non-bold unsorted list)
Step 2: 17 89 8 12 0 (each element will be removed from unsorted list and placed at the right
position in the sorted list)
Step 3: 8 17 89 12 0
Step 4: 8 12 17 89 0
Step 5: 0 8 12 17 89
Algorithm
SELECTION SORT(ARR, N)
o Step 1: Repeat Steps 2 and 3 for K = 1 to N-1
o Step 2: CALL SMALLEST(ARR, K, N, POS)
o Step 3: SWAP A[K] with ARR[POS]
[END OF LOOP]
o Step 4: EXIT
SMALLEST (ARR, K, N, POS)
o Step 1: [INITIALIZE] SET SMALL = ARR[K]
o Step 2: [INITIALIZE] SET POS = K
o Step 3: Repeat for J = K+1 to N -1
IF SMALL > ARR[J]
SET SMALL = ARR[J]
SET POS = J
[END OF IF]
[END OF LOOP]
o Step 4: RETURN POS
Program:
#include<stdio.h>
int smallest(int[],int,int);
void main ()
{
int a[10] = {10, 9, 7, 101, 23, 44, 12, 78, 34, 23};
int i,j,k,pos,temp;
for(i=0;i<10;i++)
{
pos = smallest(a,10,i);
temp = a[i];
a[i]=a[pos];
a[pos] = temp;
}
printf("nprinting sorted elements...n");
for(i=0;i<10;i++)
{
printf("%dn",a[i]);
}
}
int smallest(int a[], int n, int i)
{
int small,pos,j;
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
small = a[i];
pos = i;
for(j=i+1;j<10;j++)
{
if(a[j]<small)
{
small = a[j];
pos=j;
}
}
return pos;
}
Shell Sort
Shell sort is the generalization of insertion sort which overcomes the drawbacks of insertion sort
by comparing elements separated by a gap of several positions. In general, Shell sort performs the
following steps.
Step 1: Arrange the elements in the tabular form and sort the columns by using insertion sort.
Step 2: Repeat Step 1; each time with smaller number of longer columns in such a way that at the
end, there is only one column of data to be sorted.
Algorithm
Shell_Sort(Arr, n)
Step 1: SET FLAG = 1, GAP_SIZE = N
Step 2: Repeat Steps 3 to 6 while FLAG = 1 OR GAP_SIZE > 1
Step 3:SET FLAG = 0
Step 4:SET GAP_SIZE = (GAP_SIZE + 1) / 2
Step 5:Repeat Step 6 for I = 0 to I < (N -GAP_SIZE)
Step 6:IF Arr[I + GAP_SIZE] > Arr[I]
SWAP Arr[I + GAP_SIZE], Arr[I]
SET FLAG = 0
Step 7: END
Program
#include <stdio.h>
void shellsort(int arr[], int num)
{
int i, j, k, tmp;
for (i = num / 2; i > 0; i = i / 2)
{
for (j = i; j < num; j++)
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
{
for(k = j - i; k >= 0; k = k - i)
{
if (arr[k+i] >= arr[k])
break;
else
{
tmp = arr[k];
arr[k] = arr[k+i];
arr[k+i] = tmp;
} } } } }
int main()
{
int arr[30];
int k, num;
printf("Enter total no. of elements : ");
scanf("%d", &num);
printf("nEnter %d numbers: ", num);
for (k = 0 ; k < num; k++)
{
scanf("%d", &arr[k]);
}
shellsort(arr, num);
printf("n Sorted array is: ");
for (k = 0; k < num; k++)
printf("%d ", arr[k]);
return 0;
}
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
Hashing
The effective technique in which insertion, deletion and search will be done in constant
time. This technique is called as Hashing.
Hash Function
Each key is mapped into some number in the range 0 to Tablesize - 1 and placed in the
appropriate cell. The mapping is called a hash function.
An ideal hash table
In this example, john hashes to 3, phil hashes to 4, dave hashes to 6, and mary hashes to 7.
Routine for Hash Function
If the input keys are integers, then hash function will be key mod Tablesize. Usually,
the keys are strings; in this case, the hash function needs to be chosen carefully.
One option is to add up the ASCII values of the characters in the string.
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
typedef int INDEX;
A simple hash function
INDEX hash( char *key, int tablesize )
{
int hash_val = 0;
while( *key != '0' )
hash_val += *key++;
return( hash_val % H_SIZE );
}
Another hash function is that, key has at least two characters plus the NULL terminator.
27 represents the number of letters in the English alphabet, plus the blank, and 729 is 272
.
INDEX hash( char *key, int tablesize )
{
return ( ( key[0] + 27*key[1] + 729*key[2] ) % tablesize );
}
A good hash function
INDEX hash( char *key, int tablesize )
{
int hash_val = O;
while( *key != '0' )
hash_val = ( hash_val << 5 ) + *key++;
return( hash_val % H_SIZE );
}
The main problems deal with choosing a function,
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
Collision
what to do when two keys hash to the same value
● deciding on the table size.
When two different keys hash to same position in the hashtable, overwriting of the key
values in the hashtable. This is known as Collision.
Collision resolution
If, when inserting an element, it hashes to the same value as an already inserted element,
then we have a collision and need to resolve it.
Methods to resolve collision
1. Separate Chaining / Open Hashing
2. open addressing / Closed Hashing
a. linear Probing
b. Quadratic Probing
c. Double hashing
3. Extensible Hashing
4. Rehashing
1. Separate Chaining / Open Hashing
The first strategy, commonly known as either open hashing, or separate chaining, is to
keep a list of all elements that hash to the same value. For convenience, our lists have headers.
hash(x) = x mod 10. (The table size is 10)
To perform an insert, we traverse down the appropriate list to check whether the element
is already in place. If the element turns out to be new, it is inserted either at the front of the list
or at the end of the list. New elements are inserted at the front of the list.
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
The hash table structure contains the actual size and an array of linked lists, which
are dynamically allocated when the table is initialized.
Type declaration
typedef struct listnode *node_ptr;
struct listnode
{
elementtype element;
position next;
};
typedef node_ptr LIST;
typedef node_ptr position;
struct hashtbl
{
int tablesize;
LIST *thelists;
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
};
Initialization routine for open hash table
HASHTABLE initializetable(int tablesize )
{
HASHTABLE H;
int i;
if( table size < MIN_TABLE_SIZE )
{
error("Table size too small");
return NULL;
}
H = (HASH_TABLE) malloc ( sizeof (struct hashtbl) );
if( H == NULL )
fatalerror("Out of space!!!");
H->tablesize = nextprime( tablesize );
H->thelists = malloc( sizeof (LIST) * H->tablesize );
if( H->thelists == NULL )
fatalerror("Out of space!!!");
for(i=0; i<H->tablesize; i++ )
{
H->thelists[i] = malloc( sizeof (struct listnode) );
if( H->thelists[i] == NULL )
fatalerror("Out of space!!!");
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
else
H->thelists[i]->next = NULL;
}
return H;
}
Routine for Find operation
Position find( elementtype key, HASHTABLE H )
{
position p;
LIST L;
L = H->thelists[ hash( key, H->tablesize) ];
p = L->next;
while( (p != NULL) && (p->element != key)
) p = p->next;
return p;
}
Routine For Insert Operation
Void insert( elementtype key, HASHTABLE H )
{
position pos, newcell; LIST L;
pos = find( key, H );
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
if( pos == NULL )
{
newcell = (position) malloc(sizeof(struct listnode));
if( newcell == NULL )
fatalerror("Out of space!!!");
else
{
L = H->thelists[ hash( key, H->table size )
]; newcell->next = L->next;
newcell->element =
key; L->next = newcell;
} } }
Closed Hashing (Open Addressing)
Separate chaining has the disadvantage of requiring pointers. This tends to slow the
algorithm down a bit because of the time required to allocate new cells, and also essentially
requires the implementation of a second data structure.
Closed hashing, also known as open addressing, is an alternative to resolving collisions
with linked lists.
In a closed hashing system, if a collision occurs, alternate cells are tried until an empty
cell is found. More formally, cells h0(x), h1(x), h2(x), . . . are tried in succession where hi(x) =
(hash(x) + F(i) mod tablesize), with F(0) = 0. The function, F , is the collision resolution strategy.
Because all the data goes inside the table, a bigger table is needed for closed hashing than for
open hashing. Generally, the load factor should be below = 0.5 for closed hashing.
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
Three common collision resolution strategies are
1. Linear Probing
2. Quadratic Probing
3. Double Hashing
Linear Probing
In linear probing, F is a linear function of i, typically F(i) = i. This amounts to trying cells
sequentially (with wraparound) in search of an empty cell.
The below Figure shows the result of inserting keys {89, 18, 49, 58, 69} into a closed table
using the same hash function as before and the collision resolution strategy, F (i) = i. The first
collision occurs when 49 is inserted; it is put in the next available spot, namely 0, which is open.
58 collides with 18, 89, and then 49 before an empty cell is found three away. The collision for
69 is handled in a similar manner. As long as the table is big enough, a free cell can always be
found, but the time to do so can get quite large. Worse, even if the table is relatively empty,
blocks of occupied cells start forming. This effect, known as primary clustering, means that any
key that hashes into the cluster will require several attempts to resolve the collision, and then
it will add to the cluster.
Although we will not perform the calculations here, it can be shown that the expected number
of probes using linear probing is roughly 1/2(1 + 1/(1 - )2) for insertions and unsuccessful
searches and 1/2(1 + 1/ (1- )) for successful searches. These assumptions are satisfied by a
random collision resolution strategy and are reasonable unless is very close to 1.
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
Quadratic Probing
Quadratic probing is a collision resolution method that eliminates the primary clustering
problem of linear probing. Quadratic probing is what you would expect-the collision function is
quadratic. The popular choice is F(i) = i2
. the below figure shows the resulting closed table with
this collision function on the same input used in the linear probing example.
When 49 collide with 89, the next position attempted is one cell away. This cell is empty,
so 49 is placed there. Next 58 collides at position 8. Then the cell one away is tried but another
collision occurs. A vacant cell is found at the next cell tried, which is 22
= 4 away. 58 is thus
placed in cell 2. The same thing happens for 69.
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
For linear probing it is a bad idea to let the hash table get nearly full, because performance
degrades. For quadratic probing, the situation is even more drastic: There is no guarantee of
finding an empty cell once the table gets more than half full, or even before the table gets half
full if the table size is not prime. This is because at most half of the table can be used as alternate
locations to resolve collisions.
Type declaration for open addressing hash tables
enum kind_of_entry { legitimate, empty, deleted };
struct hash_entry
{
element_type element;
enum kind_of_entry info;
};
/* the_cells is an array of hash_entry cells, allocated later */
struct hash_tbl
{
unsigned int table_size;
cell *the_cells;
}
;
Routine to initialize closed hash table
Hashtable initialize_table( unsigned int table_size )
{
hashtable H;
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
int i;
if( table_size < MIN_TABLE_SIZE )
{
error("Table size too small");
return NULL;
}
H = (hashtable) malloc( sizeof ( struct hash_tbl ) );
if( H == NULL )
fatal_error("Out of space!!!");
H->table_size = next_prime( table_size );
H->the cells = (cell *) malloc ( sizeof ( cell ) * H->table_size );
if( H->the_cells == NULL )
fatal_error("Out of space!!!");
for(i=0; i<H->table_size; i++
) H->the_cells[i].info =
empty; return H;
}
Find routine for closed hashing with quadratic probing
Position find( element_type key, hashtable H )
{
position i, current_pos;
i = 0;
current_pos = hash( key, H->table_size );
while( (H->the_cells[current_pos].element != key ) && (H-> the_cells[current_pos]
.info != empty ) )
{
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
current_pos += 2*(++i) - 1;
if( current_pos >= H->table_size )
current_pos -= H->table_size;
}
return current_pos;
}
Insert routine for closed hash tables with quadratic probing
Void insert( element_type key, hashtable H )
{
position pos;
pos = find( key, H );
if( H->the_cells[pos].info != legitimate )
{ /* ok to insert here */
H->the_cells[pos].info = legitimate;
H->the_cells[pos].element = key;
}
}
Double Hashing
The last collision resolution method we will examine is double hashing. For double hashing, one
popular choice is f(i) = i h2 (x). This formula says that we apply a second hash function to x and
probe at a distance h2 (x), 2 h2 (x), . . ., and so on. A function such as h2 (x) = R - (x mod R), with
R a prime smaller than H_SIZE, will work well. If we choose R = 7, then below Figure shows
the results of inserting the same keys as before.
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
Closed hash table with double hashing, after each insertion.
The first collision occurs when 49 is inserted. h2 (49) = 7 - 0 = 7, so 49 is inserted in
position 6. h2 (58) = 7 - 2 = 5, so 58 is inserted at location 3. Finally, 69 collides and is inserted
at a distance h2 (69) = 7 - 6 = 1 away. If we tried to insert 60 in position 0, we would have a
collision. Since h2 (60) = 7 - 4 = 3, we would then try positions 3, 6, 9, and then 2 until an empty
spot is found.
Rehashing
If the table gets too full, the running time for the operations will start taking too long and
inserts might fail for closed hashing with quadratic resolution. This can happen if there are too
many deletions intermixed with insertions.
A solution, then, is to build another table that is about twice as big and scan down the
entire original hash table, computing the new hash value for element and inserting it in the new
table.
As an example, suppose the elements 13, 15, 24, and 6 are inserted into a closed hash
table of size 7. The hash function is h(x) = x mod 7. Suppose linear probing is used to resolve
collisions.
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
The resulting hash table appears in below figure
If 23 is inserted into the table, the resulting table in below figure will be over 70 percent
full. Because the table is so full, a new table is created.
The size of this table is 17, because this is the first prime which is twice as large as the
old table size. The new hash function is then h(x) = x mod 17.
The old table is scanned, and elements 6, 15, 23, 24, and 13 are inserted into the new
table. The resulting table appears as below.
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
This entire operation is called rehashing. This is obviously a very expensive operation –
the running time is O(n).
Rehashing routines
Hashtable rehash( HASH_TABLE H )
{
unsigned int i, old_size;
cell *old_cells;
old_cells = H->the_cells;
old_size = H->table_size;
/* Get a new, empty table */
H = initialize_table( 2*old_size );
/* Scan through old table, reinserting into new */
for( i=0; i<old_size; i++ )
if( old_cells[i].info == legitimate )
insert( old_cells[i].element, H );
free( old_cells );
return H;
}
Extendible Hashing
If the amount of data is too large to fit in main memory, then is the number of disk accesses
required to retrieve data. As before, we assume that at any point we have n records to store; the
value of n changes over time. Furthermore, at most m records fit in one disk block. We will use
m = 4 in this section.
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V
If either open hashing or closed hashing is used, the major problem is that collisions
could cause several blocks to be examined during a find, even for a well-distributed hash table.
Furthermore, when the table gets too full, an extremely expensive rehashing step must
be performed, which requires O(n) disk accesses.
A clever alternative, known as extendible hashing, allows a find to be performed in two
disk accesses. Insertions also require few disk accesses.
If the time to perform this step could be reduced, then we would have a practical scheme. This
is exactly the strategy used by extendible hashing.
Let us suppose, for the moment, that our data consists of several six-bit integers. The root
of the "tree" contains four pointers determined by the leading two bits of the data. Each leaf has
up to m = 4 elements.
It happens that in each leaf the first two bits are identical; this is indicated by the number
in parentheses.
To be more formal, D will represent the number of bits used by the root, which is
sometimes known as the directory. The number of entries in the directory is thus 2D
. dL is the
number of leading bits that all the elements of some leaf have in common. dL will depend on the
particular leaf, and dL<=D.
Extendible hashing: original data
191GES205T -PROGRAMMING AND DATA STRUCTURES USING C
UNIT-V
Suppose that we want to insert the key 100100. This would go into the third leaf, but
as the third leaf is already full, there is no room. We thus split this leaf into two
leaves, which are now determined by the first three bits. This requires increasing the
directory size to 3.
If the key 000000 is now inserted, then the first leaf is split, generating two leaves
with dL = 3. Since D = 3, the only change required in the directory is the updating of
the 000 and 001 pointers.
***************************************ALL THE BEST*************************************

More Related Content

Similar to UNIT V.docx (20)

21CS32 DS Module 1 PPT.pptx
21CS32 DS Module 1 PPT.pptx21CS32 DS Module 1 PPT.pptx
21CS32 DS Module 1 PPT.pptx
 
Data structures arrays
Data structures   arraysData structures   arrays
Data structures arrays
 
search_sort.ppt
search_sort.pptsearch_sort.ppt
search_sort.ppt
 
Searching
Searching Searching
Searching
 
search_sort Search sortSearch sortSearch sortSearch sort
search_sort Search sortSearch sortSearch sortSearch sortsearch_sort Search sortSearch sortSearch sortSearch sort
search_sort Search sortSearch sortSearch sortSearch sort
 
Heap, quick and merge sort
Heap, quick and merge sortHeap, quick and merge sort
Heap, quick and merge sort
 
data structures and algorithms Unit 3
data structures and algorithms Unit 3data structures and algorithms Unit 3
data structures and algorithms Unit 3
 
Address calculation-sort
Address calculation-sortAddress calculation-sort
Address calculation-sort
 
Linear and Binary Search
Linear and Binary SearchLinear and Binary Search
Linear and Binary Search
 
21-algorithms.ppt
21-algorithms.ppt21-algorithms.ppt
21-algorithms.ppt
 
Algorithms with-java-advanced-1.0
Algorithms with-java-advanced-1.0Algorithms with-java-advanced-1.0
Algorithms with-java-advanced-1.0
 
Searching.ppt
Searching.pptSearching.ppt
Searching.ppt
 
21-algorithms (1).ppt
21-algorithms (1).ppt21-algorithms (1).ppt
21-algorithms (1).ppt
 
ARRAY in python and c with examples .pptx
ARRAY  in python and c with examples .pptxARRAY  in python and c with examples .pptx
ARRAY in python and c with examples .pptx
 
14-sorting.ppt
14-sorting.ppt14-sorting.ppt
14-sorting.ppt
 
14-sorting (3).ppt
14-sorting (3).ppt14-sorting (3).ppt
14-sorting (3).ppt
 
14-sorting.ppt
14-sorting.ppt14-sorting.ppt
14-sorting.ppt
 
14-sorting.ppt
14-sorting.ppt14-sorting.ppt
14-sorting.ppt
 
Unit6 C
Unit6 C Unit6 C
Unit6 C
 
sorting.pptx
sorting.pptxsorting.pptx
sorting.pptx
 

More from Revathiparamanathan (20)

UNIT 1 NOTES.docx
UNIT 1 NOTES.docxUNIT 1 NOTES.docx
UNIT 1 NOTES.docx
 
Unit 3,4.docx
Unit 3,4.docxUnit 3,4.docx
Unit 3,4.docx
 
UNIT II.docx
UNIT II.docxUNIT II.docx
UNIT II.docx
 
COMPILER DESIGN.docx
COMPILER DESIGN.docxCOMPILER DESIGN.docx
COMPILER DESIGN.docx
 
UNIT -III.docx
UNIT -III.docxUNIT -III.docx
UNIT -III.docx
 
UNIT -IV.docx
UNIT -IV.docxUNIT -IV.docx
UNIT -IV.docx
 
UNIT - II.docx
UNIT - II.docxUNIT - II.docx
UNIT - II.docx
 
UNIT -V.docx
UNIT -V.docxUNIT -V.docx
UNIT -V.docx
 
UNIT - I.docx
UNIT - I.docxUNIT - I.docx
UNIT - I.docx
 
CC -Unit3.pptx
CC -Unit3.pptxCC -Unit3.pptx
CC -Unit3.pptx
 
CC -Unit4.pptx
CC -Unit4.pptxCC -Unit4.pptx
CC -Unit4.pptx
 
CC.pptx
CC.pptxCC.pptx
CC.pptx
 
Unit 4 notes.pdf
Unit 4 notes.pdfUnit 4 notes.pdf
Unit 4 notes.pdf
 
Unit 3 notes.pdf
Unit 3 notes.pdfUnit 3 notes.pdf
Unit 3 notes.pdf
 
Unit 1 notes.pdf
Unit 1 notes.pdfUnit 1 notes.pdf
Unit 1 notes.pdf
 
Unit 2 notes.pdf
Unit 2 notes.pdfUnit 2 notes.pdf
Unit 2 notes.pdf
 
Unit 5 notes.pdf
Unit 5 notes.pdfUnit 5 notes.pdf
Unit 5 notes.pdf
 
CC.pptx
CC.pptxCC.pptx
CC.pptx
 
Unit-4 Day1.pptx
Unit-4 Day1.pptxUnit-4 Day1.pptx
Unit-4 Day1.pptx
 
Scala Introduction.pptx
Scala Introduction.pptxScala Introduction.pptx
Scala Introduction.pptx
 

Recently uploaded

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 

Recently uploaded (20)

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 

UNIT V.docx

  • 1. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V UNIT V : SEARCHING, SORTING AND HASHING TECHNIQUES 9 Searching- Linear Search - Binary Search. Sorting - Bubble sort - Selection sort - Insertion sort - Shell sort. Hashing- Hash Functions – Separate Chaining – Open Addressing – Rehashing – Extendible Hashing. Searching is used to find the location where an element is available. There are two types of search techniques. They are: ● Linear or sequential search ● Binary search Sorting allows an efficient arrangement of elements within a given data structure. It is a way in which the elements are organized systematically for some purpose. For example, a dictionary in which words is arranged in alphabetical order and telephone director in which the subscriber names are listed in alphabetical order. There are many sorting techniques out of which we study the following. • Bubble sort • Insertion sort • Selection sort and • Shell sort There are two types of sorting techniques: 1.2. Internal sorting 1.3. External sorting If all the elements to be sorted are present in the main memory then such sorting is called internal sorting on the other hand, if some of the elements to be sorted are kept on the secondary storage, it is called external sorting. Here we study only internal sorting techniques. Linear Search: This is the simplest of all searching techniques. In this technique, an ordered or unordered list will be searched one by one from the beginning until the desired element is found. If the desired element is found in the list then the search is successful otherwise unsuccessful. Suppose there are ‗n’ elements organized sequentially on a List. The number of comparisons required to retrieve an element from the list, purely depends on where the element is stored in the list. If it is the first element, one comparison will do; if it is second element two comparisons are necessary and so on. On an average you need [(n+1)/2] comparison‘s to search an element. If search is not successful, you would need ‘n’ comparisons. The time complexity of linear search is O(n).
  • 2. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V Algorithm: Let array a[n] stores n elements. Determine whether element ‗x‘ is present or not. linsrch(a[n], x) { index = 0; flag = 0; while (index < n) do { if (x == a[index]) { flag = 1; break; } index ++; } if(flag == 1) printf(―Data found at %d position―, index); else printf(―data not found‖); } Example : Let us illustrate linear search on the following 9 elements: Index 0 1 2 3 4 5 6 7 8 Elements -15 -6 0 7 9 23 54 82 101 Searching different elements is as follows: Searching for x = 7 Search successful, data found at 3 rd position. Searching for x = 82 Search successful, data found at 7 th position. Searching for x = 42 Search un-successful, data not found. A non-recursive program for Linear Search: { include <stdio.h> { include <conio.h> main() { int number[25], n, data, i, flag = 0; clrscr(); printf("n Enter the number of elements: "); scanf("%d", &n); printf("n Enter the elements: "); for(i = 0; i < n; i++) scanf("%d", &number[i]); printf("n Enter the element to be Searched: "); scanf("%d", &data); for( i = 0; i < n; i++) {
  • 3. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V if(number[i] == data) { flag = 1; break; } } if(flag == 1) printf("n Data found at location: %d", i+1); else } printf("n Data not found "); } A Recursive program for linear search: include <stdio.h> include <conio.h> void linear_search(int a[], int data, int position, int n) { if(position < n) { if(a[position] == data) printf("n Data Found at %d ", position); else linear_search(a, data, position + 1, n); } void main() { printf("n Data not found"); int a[25], i, n, data; clrscr(); printf("n Enter the number of elements: "); scanf("%d", &n); printf("n Enter the elements: "); for(i = 0; i < n; i++) { scanf("%d", &a[i]); } printf("n Enter the element to be seached: "); scanf("%d", &data); linear_search(a, data, 0, n); getch(); } BINARY SEARCH If we have ‗n‘ records which have been ordered by keys so that x1 < x2 < … < xn . When we are given a element ‗x‘, binary search is used to find the corresponding element from the list. In case ‗x‘ is present, we have to determine a value ‗j‘ such that a[j] = x (successful search). If ‗x‘ is not in the list then j is to set to zero (un successful search).
  • 4. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V In Binary search we jump into the middle of the file, where we find key a[mid], and compare ‗x‘ with a[mid]. If x = a[mid] then the desired record has been found. If x < a[mid] then ‗x‘ must be in that portion of the file that precedes a[mid]. Similarly, if a[mid] > x, then further search is only necessary in that part of the file which follows a[mid]. If we use recursive procedure of finding the middle key a[mid] of the un-searched portion of a file, then every un-successful comparison of ‗x‘ with a[mid] will eliminate roughly half the un-searched portion from consideration. Since the array size is roughly halved after each comparison between ‗x‘ and a[mid], and since an array of length ‗n‘ can be halved only about log2n times before reaching a trivial length, the worst case complexity of Binary search is about log2n. Algorithm: Let array a[n] of elements in increasing order, n ≥ 0, determine whether ‗x‘ is present, and if so, set j such that x = a[j] else return 0. binsrch(a[], n, x) { low = 1; high = n; while (low < high) do { mid = (low + high)/2 if (x < a[mid]) high = mid – 1; else if (x > a[mid]) low = mid + 1; else return mid; } return 0; } low and high are integer variables such that each time through the loop either ‗x‘ is found or low is increased by at least one or high is decreased by at least one. Thus we have two sequences of integers approaching each other and eventually low will become greater than high causing termination in a finite number of steps if ‗x‘ is not present. Example 1: Let us illustrate binary search on the following 12 elements: Index 1 2 3 4 5 6 7 8 9 10 11 12 Elements 4 7 8 9 16 20 24 38 39 45 54 77 If we are searching for x = 4: (This needs 3 comparisons) low = 1, high = 12, mid = 13/2 = 6, check 20 low = 1, high = 5, mid = 6/2 = 3, check 8 low = 1, high = 2, mid = 3/2 = 1, check 4, found If we are searching for x = 7: (This needs 4 comparisons) low = 1, high = 12, mid = 13/2 = 6, check 20 low = 1, high = 5, mid = 6/2 = 3, check 8 low = 1, high = 2, mid = 3/2 = 1, check 4 low = 2, high = 2, mid = 4/2 = 2, check 7, found
  • 5. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V If we are searching for x = 8: (This needs 2 comparisons) low = 1, high = 12, mid = 13/2 = 6, check 20 low = 1, high = 5, mid = 6/2 = 3, check 8, found If we are searching for x = 9: (This needs 3 comparisons) low = 1, high = 12, mid = 13/2 = 6, check 20 low = 1, high = 5, mid = 6/2 = 3, check 8 low = 4, high = 5, mid = 9/2 = 4, check 9, found If we are searching for x = 16: (This needs 4 comparisons) low = 1, high = 12, mid = 13/2 = 6, check 20 low = 1, high = 5, mid = 6/2 = 3, check 8 low = 4, high = 5, mid = 9/2 = 4, check 9 low = 5, high = 5, mid = 10/2 = 5, check 16, found If we are searching for x = 20: (This needs 1 comparison) low = 1, high = 12, mid = 13/2 = 6, check 20, found If we are searching for x = 24: (This needs 3 comparisons) low = 1, high = 12, mid = 13/2 = 6, check 20 low = 7, high = 12, mid = 19/2 = 9, check 39 low = 7, high = 8, mid = 15/2 = 7, check 24, found If we are searching for x = 38: (This needs 4 comparisons) low = 1, high = 12, mid = 13/2 = 6, check 20 low = 7, high = 12, mid = 19/2 = 9, check 39 low = 7, high = 8, mid = 15/2 = 7, check 24 low = 8, high = 8, mid = 16/2 = 8, check 38, found If we are searching for x = 39: (This needs 2 comparisons) low = 1, high = 12, mid = 13/2 = 6, check 20 low = 7, high = 12, mid = 19/2 = 9, check 39, found If we are searching for x = 45: (This needs 4 comparisons) low = 1, high = 12, mid = 13/2 = 6, check 20 low = 7, high = 12, mid = 19/2 = 9, check 39 low = 10, high = 12, mid = 22/2 = 11, check 54 low = 10, high = 10, mid = 20/2 = 10, check 45, found If we are searching for x = 54: (This needs 3 comparisons) low = 1, high = 12, mid = 13/2 = 6, check 20 low = 7, high = 12, mid = 19/2 = 9, check 39 low = 10, high = 12, mid = 22/2 = 11, check 54, found If we are searching for x = 77: (This needs 4 comparisons) low = 1, high = 12, mid = 13/2 = 6, check 20 low = 7, high = 12, mid = 19/2 = 9, check 39 low = 10, high = 12, mid = 22/2 = 11, check 54 low = 12, high = 12, mid = 24/2 = 12, check 77, found The number of comparisons necessary by search element: 20 – requires 1 comparison; 8 and 39 – requires 2 comparisons; 4, 9, 24, 54 – requires 3 comparisons and 7, 16, 38, 45, 77 – requires 4 comparisons
  • 6. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V Summing the comparisons, needed to find all twelve items and dividing by 12, yielding 37/12 or approximately 3.08 comparisons per successful search on the average.A non- recursive program for binary search: Example: #include <stdio.h> int binarySearch(int a[], int beg, int end, int val) { int mid; if(end >= beg) { mid = (beg + end)/2; /* if the item to be searched is present at middle */ if(a[mid] == val) { return mid+1; } /* if the item to be searched is smaller than middle, then it can only be in left subarray */ else if(a[mid] < val) { return binarySearch(a, mid+1, end, val); } /* if the item to be searched is greater than middle, then it can only be in right subarray */ else { return binarySearch(a, beg, mid-1, val); } } return -1; } int main() { int a[] = {11, 14, 25, 30, 40, 41, 52, 57, 70}; // given array int val = 40; // value to be searched int n = sizeof(a) / sizeof(a[0]); // size of array int res = binarySearch(a, 0, n-1, val); // Store result printf("The elements of the array are - "); for (int i = 0; i < n; i++) printf("%d ", a[i]); printf("nElement to be searched is - %d", val); if (res == -1) printf("nElement is not present in the array"); else printf("nElement is present at %d position of array", res); return 0; } A recursive program for binary search: # include <stdio.h> # include <conio.h>
  • 7. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V void bin_search(int a[], int data, int low, int high) { int mid ; if( low <= high) { mid = (low + high)/2; if(a[mid] == data) printf("n Element found at location: %d ", mid + 1); else { if(data < a[mid]) bin_search(a, data, low, mid-1);else bin_search(a, data, mid+1, high); } else printf("n Element not found"); } void main() { int a[25], i, n, data; clrscr(); printf("n Enter the number of elements: "); scanf("%d", &n); printf("n Enter the elements in ascending order: "); for(i = 0; i < n; i++) scanf("%d", &a[i]); printf("n Enter the element to be searched: "); scanf("%d", &data); bin_search(a, data, 0, n-1); getch(); } Bubble Sort: The bubble sort is easy to understand and program. The basic idea of bubble sort is to pass through the file sequentially several times. In each pass, we compare each element in the file with its successor i.e., X[i] with X[i+1] and interchange two element when they are not in proper order. We will illustrate this sorting technique by taking a specific example. Bubble sort is also called as exchange sort. Example: Consider the array x[n] which is stored in memory as shown below: X[0] X[1] X[2] X[3] X[4] X[5] 33 44 22 11 66 55 Suppose we want our array to be stored in ascending order. Then we pass through the array 5 times as described below: Pass 1: (first element is compared with all other elements).
  • 8. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V We compare X[i] and X[i+1] for i = 0, 1, 2, 3, and 4, and interchange X[i] and X[i+1] if X[i] > X[i+1]. The process is shown below: X[0] X[1] X[2] X[3] X[4] X[5] Remarks 33 44 22 11 66 55 22 44 11 44 44 66 55 66 33 22 11 44 55 66 The biggest number 66 is moved to (bubbled up) the right most position in the array. Pass 2: (second element is compared). We repeat the same process, but this time we don‘t include X[5] into our comparisons. i.e., we compare X[i] with X[i+1] for i=0, 1, 2, and 3 and interchange X[i] and X[i+1] if X[i] > X[i+1]. The process is shown below: X[0] X[1] X[2] X[3] X[4] Remarks 33 22 22 22 33 11 11 11 33 33 33 44 44 44 44 55 55 55 The second biggest number 55 is moved now to X[4]. Pass 3: (third element is compared). We repeat the same process, but this time we leave both X[4] and X[5]. By doing this, we move the third biggest number 44 to X[3]. X[0] X[1] X[2] X[3] Remarks 22 11 33 44 11 22 22 33 33 44 11 22 33 44 Pass 4: (fourth element is compared).
  • 9. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V X[0]X[1] X[2] Remarks We repeat the process leaving X[3], X[4], and X[5]. By doing this, we move the fourth biggest number 33 to X[2]. 11 22 33 11 22 22 33 Pass 5: (fifth element is compared). We repeat the process leaving X[2], X[3], X[4], and X[5]. By doing this, we move the fifth biggest number 22 to X[1]. At this time, we will have the smallest number 11 in X[0]. Thus, we see that we can sort the array of size 6 in 5 passes. For an array of size n, we required (n-1) passes. Program for Bubble Sort: #include <stdio.h> #include <conio.h> void bubblesort(int x[], int n) { int i, j, temp; for (i = 0; i < n; i++) { for (j = 0; j < n–i-1 ; j++) { if (x[j] > x[j+1]) { temp = x[j]; x[j] = x[j+1]; x[j+1] = temp; } } } } main() { int i, n, x[25]; clrscr(); printf("n Enter the number of elements: "); scanf("%d", &n); printf("n Enter Data:"); for(i = 0; i < n ; i++) scanf("%d", &x[i]); bubblesort(x, n); printf ("n Array Elements after sorting: "); for (i = 0; i < n; i++) printf ("%5d", x[i]); } Selection Sort:
  • 10. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V Selection sort will not require no more than n-1 interchanges. Suppose x is an array of size n stored in memory. The selection sort algorithm first selects the smallest element in the array x and place it at array position 0; then it selects the next smallest element in the array x and place it at array position 1. It simply continues this procedure until it places the biggest element in the last position of the array. The array is passed through (n-1) times and the smallest element is placed in its respective position in the array as detailed below: Pass 1: Find the location j of the smallest element in the array x [0], x[1], x[n-1], and then interchange x[j] with x[0]. Then x[0] is sorted. Pass 2: Leave the first element and find the location j of the smallest element in the sub-array x[1], x[2], . . . . x[n-1], and then interchange x[1] with x[j]. Then x[0], x[1] are sorted. Pass 3: Leave the first two elements and find the location j of the smallest element in the sub- array x[2], x[3], . . . . x[n-1], and then interchange x[2] with x[j]. Then x[0], x[1], x[2] are sorted. Pass (n-1): Find the location j of the smaller of the elements x[n-2] and x[n-1], and then interchange x[j] and x[n-2]. Then x[0], x[1], . . . . x[n-2] are sorted. Of course, during this pass x[n-1] will be the biggest element and so the entire array is sorted. Example: Let us consider the following example with 9 elements to analyze selection Sort: 1 2 3 4 5 6 7 8 9 Remarks 65 70 75 80 50 60 55 85 45 find the first smallest element i j swap a[i] & a[j] 45 70 75 80 50 60 55 85 65 find the second smallest element i j swap a[i] and a[j] 45 50 75 80 70 60 55 85 65 Find the third smallest element i j swap a[i] and a[j] 45 50 55 80 70 60 75 85 65 Find the fourth smallest element i j swap a[i] and a[j] 45 50 55 60 70 80 75 85 65 Find the fifth smallest element i j swap a[i] and a[j] 45 50 55 60 65 80 75 85 70 Find the sixth smallest element i j swap a[i] and a[j] 45 50 55 60 65 70 75 85 80 Find the seventh smallest element i j swap a[i] and a[j] 45 50 55 60 65 70 75 85 80 Find the eighth smallest element i J swap a[i] and a[j] 45 50 55 60 65 70 75 80 85 The outer loop ends.
  • 11. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V Non-recursive Program for selection sort: # include<stdio.h> # include<conio.h> void selectionSort( int low, int high ); int a[25]; int main() { int num, i= 0; clrscr(); printf( "Enter the number of elements: " ); scanf("%d", &num); printf( "nEnter the elements:n" ); for(i=0; i < num; i++) scanf( "%d", &a[i] ); selectionSort( 0, num - 1 ); printf( "nThe elements after sorting are: " ); for( i=0; i< num; i++ ) printf( "%d ", a[i] ); return 0; } void selectionSort( int low, int high ) { int i=0, j=0, temp=0, minindex; for( i=low; i <= high; i++ ) { minindex = i; for( j=i+1; j <= high; j++ ) { if( a[j] < a[minindex] ) minindex = j; } temp = a[i]; a[i] = a[minindex]; a[minindex] = temp; } } Insertion sort algorithm Insertion sort algorithm picks elements one by one and places it to the right position where it belongs in the sorted list of elements. In the following C program we have implemented the same logic. Before going through the program, lets see the steps of insertion sort with the help of an example. Input elements: 89 17 8 12 0
  • 12. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V Step 1: 89 17 8 12 0 (the bold elements are sorted list and non-bold unsorted list) Step 2: 17 89 8 12 0 (each element will be removed from unsorted list and placed at the right position in the sorted list) Step 3: 8 17 89 12 0 Step 4: 8 12 17 89 0 Step 5: 0 8 12 17 89 Algorithm SELECTION SORT(ARR, N) o Step 1: Repeat Steps 2 and 3 for K = 1 to N-1 o Step 2: CALL SMALLEST(ARR, K, N, POS) o Step 3: SWAP A[K] with ARR[POS] [END OF LOOP] o Step 4: EXIT SMALLEST (ARR, K, N, POS) o Step 1: [INITIALIZE] SET SMALL = ARR[K] o Step 2: [INITIALIZE] SET POS = K o Step 3: Repeat for J = K+1 to N -1 IF SMALL > ARR[J] SET SMALL = ARR[J] SET POS = J [END OF IF] [END OF LOOP] o Step 4: RETURN POS Program: #include<stdio.h> int smallest(int[],int,int); void main () { int a[10] = {10, 9, 7, 101, 23, 44, 12, 78, 34, 23}; int i,j,k,pos,temp; for(i=0;i<10;i++) { pos = smallest(a,10,i); temp = a[i]; a[i]=a[pos]; a[pos] = temp; } printf("nprinting sorted elements...n"); for(i=0;i<10;i++) { printf("%dn",a[i]); } } int smallest(int a[], int n, int i) { int small,pos,j;
  • 13. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V small = a[i]; pos = i; for(j=i+1;j<10;j++) { if(a[j]<small) { small = a[j]; pos=j; } } return pos; } Shell Sort Shell sort is the generalization of insertion sort which overcomes the drawbacks of insertion sort by comparing elements separated by a gap of several positions. In general, Shell sort performs the following steps. Step 1: Arrange the elements in the tabular form and sort the columns by using insertion sort. Step 2: Repeat Step 1; each time with smaller number of longer columns in such a way that at the end, there is only one column of data to be sorted. Algorithm Shell_Sort(Arr, n) Step 1: SET FLAG = 1, GAP_SIZE = N Step 2: Repeat Steps 3 to 6 while FLAG = 1 OR GAP_SIZE > 1 Step 3:SET FLAG = 0 Step 4:SET GAP_SIZE = (GAP_SIZE + 1) / 2 Step 5:Repeat Step 6 for I = 0 to I < (N -GAP_SIZE) Step 6:IF Arr[I + GAP_SIZE] > Arr[I] SWAP Arr[I + GAP_SIZE], Arr[I] SET FLAG = 0 Step 7: END Program #include <stdio.h> void shellsort(int arr[], int num) { int i, j, k, tmp; for (i = num / 2; i > 0; i = i / 2) { for (j = i; j < num; j++)
  • 14. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V { for(k = j - i; k >= 0; k = k - i) { if (arr[k+i] >= arr[k]) break; else { tmp = arr[k]; arr[k] = arr[k+i]; arr[k+i] = tmp; } } } } } int main() { int arr[30]; int k, num; printf("Enter total no. of elements : "); scanf("%d", &num); printf("nEnter %d numbers: ", num); for (k = 0 ; k < num; k++) { scanf("%d", &arr[k]); } shellsort(arr, num); printf("n Sorted array is: "); for (k = 0; k < num; k++) printf("%d ", arr[k]); return 0; }
  • 15. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V Hashing The effective technique in which insertion, deletion and search will be done in constant time. This technique is called as Hashing. Hash Function Each key is mapped into some number in the range 0 to Tablesize - 1 and placed in the appropriate cell. The mapping is called a hash function. An ideal hash table In this example, john hashes to 3, phil hashes to 4, dave hashes to 6, and mary hashes to 7. Routine for Hash Function If the input keys are integers, then hash function will be key mod Tablesize. Usually, the keys are strings; in this case, the hash function needs to be chosen carefully. One option is to add up the ASCII values of the characters in the string.
  • 16. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V typedef int INDEX; A simple hash function INDEX hash( char *key, int tablesize ) { int hash_val = 0; while( *key != '0' ) hash_val += *key++; return( hash_val % H_SIZE ); } Another hash function is that, key has at least two characters plus the NULL terminator. 27 represents the number of letters in the English alphabet, plus the blank, and 729 is 272 . INDEX hash( char *key, int tablesize ) { return ( ( key[0] + 27*key[1] + 729*key[2] ) % tablesize ); } A good hash function INDEX hash( char *key, int tablesize ) { int hash_val = O; while( *key != '0' ) hash_val = ( hash_val << 5 ) + *key++; return( hash_val % H_SIZE ); } The main problems deal with choosing a function,
  • 17. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V Collision what to do when two keys hash to the same value ● deciding on the table size. When two different keys hash to same position in the hashtable, overwriting of the key values in the hashtable. This is known as Collision. Collision resolution If, when inserting an element, it hashes to the same value as an already inserted element, then we have a collision and need to resolve it. Methods to resolve collision 1. Separate Chaining / Open Hashing 2. open addressing / Closed Hashing a. linear Probing b. Quadratic Probing c. Double hashing 3. Extensible Hashing 4. Rehashing 1. Separate Chaining / Open Hashing The first strategy, commonly known as either open hashing, or separate chaining, is to keep a list of all elements that hash to the same value. For convenience, our lists have headers. hash(x) = x mod 10. (The table size is 10) To perform an insert, we traverse down the appropriate list to check whether the element is already in place. If the element turns out to be new, it is inserted either at the front of the list or at the end of the list. New elements are inserted at the front of the list.
  • 18. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V The hash table structure contains the actual size and an array of linked lists, which are dynamically allocated when the table is initialized. Type declaration typedef struct listnode *node_ptr; struct listnode { elementtype element; position next; }; typedef node_ptr LIST; typedef node_ptr position; struct hashtbl { int tablesize; LIST *thelists;
  • 19. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V }; Initialization routine for open hash table HASHTABLE initializetable(int tablesize ) { HASHTABLE H; int i; if( table size < MIN_TABLE_SIZE ) { error("Table size too small"); return NULL; } H = (HASH_TABLE) malloc ( sizeof (struct hashtbl) ); if( H == NULL ) fatalerror("Out of space!!!"); H->tablesize = nextprime( tablesize ); H->thelists = malloc( sizeof (LIST) * H->tablesize ); if( H->thelists == NULL ) fatalerror("Out of space!!!"); for(i=0; i<H->tablesize; i++ ) { H->thelists[i] = malloc( sizeof (struct listnode) ); if( H->thelists[i] == NULL ) fatalerror("Out of space!!!");
  • 20. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V else H->thelists[i]->next = NULL; } return H; } Routine for Find operation Position find( elementtype key, HASHTABLE H ) { position p; LIST L; L = H->thelists[ hash( key, H->tablesize) ]; p = L->next; while( (p != NULL) && (p->element != key) ) p = p->next; return p; } Routine For Insert Operation Void insert( elementtype key, HASHTABLE H ) { position pos, newcell; LIST L; pos = find( key, H );
  • 21. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V if( pos == NULL ) { newcell = (position) malloc(sizeof(struct listnode)); if( newcell == NULL ) fatalerror("Out of space!!!"); else { L = H->thelists[ hash( key, H->table size ) ]; newcell->next = L->next; newcell->element = key; L->next = newcell; } } } Closed Hashing (Open Addressing) Separate chaining has the disadvantage of requiring pointers. This tends to slow the algorithm down a bit because of the time required to allocate new cells, and also essentially requires the implementation of a second data structure. Closed hashing, also known as open addressing, is an alternative to resolving collisions with linked lists. In a closed hashing system, if a collision occurs, alternate cells are tried until an empty cell is found. More formally, cells h0(x), h1(x), h2(x), . . . are tried in succession where hi(x) = (hash(x) + F(i) mod tablesize), with F(0) = 0. The function, F , is the collision resolution strategy. Because all the data goes inside the table, a bigger table is needed for closed hashing than for open hashing. Generally, the load factor should be below = 0.5 for closed hashing.
  • 22. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V Three common collision resolution strategies are 1. Linear Probing 2. Quadratic Probing 3. Double Hashing Linear Probing In linear probing, F is a linear function of i, typically F(i) = i. This amounts to trying cells sequentially (with wraparound) in search of an empty cell. The below Figure shows the result of inserting keys {89, 18, 49, 58, 69} into a closed table using the same hash function as before and the collision resolution strategy, F (i) = i. The first collision occurs when 49 is inserted; it is put in the next available spot, namely 0, which is open. 58 collides with 18, 89, and then 49 before an empty cell is found three away. The collision for 69 is handled in a similar manner. As long as the table is big enough, a free cell can always be found, but the time to do so can get quite large. Worse, even if the table is relatively empty, blocks of occupied cells start forming. This effect, known as primary clustering, means that any key that hashes into the cluster will require several attempts to resolve the collision, and then it will add to the cluster. Although we will not perform the calculations here, it can be shown that the expected number of probes using linear probing is roughly 1/2(1 + 1/(1 - )2) for insertions and unsuccessful searches and 1/2(1 + 1/ (1- )) for successful searches. These assumptions are satisfied by a random collision resolution strategy and are reasonable unless is very close to 1.
  • 23. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V Quadratic Probing Quadratic probing is a collision resolution method that eliminates the primary clustering problem of linear probing. Quadratic probing is what you would expect-the collision function is quadratic. The popular choice is F(i) = i2 . the below figure shows the resulting closed table with this collision function on the same input used in the linear probing example. When 49 collide with 89, the next position attempted is one cell away. This cell is empty, so 49 is placed there. Next 58 collides at position 8. Then the cell one away is tried but another collision occurs. A vacant cell is found at the next cell tried, which is 22 = 4 away. 58 is thus placed in cell 2. The same thing happens for 69.
  • 24. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V For linear probing it is a bad idea to let the hash table get nearly full, because performance degrades. For quadratic probing, the situation is even more drastic: There is no guarantee of finding an empty cell once the table gets more than half full, or even before the table gets half full if the table size is not prime. This is because at most half of the table can be used as alternate locations to resolve collisions. Type declaration for open addressing hash tables enum kind_of_entry { legitimate, empty, deleted }; struct hash_entry { element_type element; enum kind_of_entry info; }; /* the_cells is an array of hash_entry cells, allocated later */ struct hash_tbl { unsigned int table_size; cell *the_cells; } ; Routine to initialize closed hash table Hashtable initialize_table( unsigned int table_size ) { hashtable H;
  • 25. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V int i; if( table_size < MIN_TABLE_SIZE ) { error("Table size too small"); return NULL; } H = (hashtable) malloc( sizeof ( struct hash_tbl ) ); if( H == NULL ) fatal_error("Out of space!!!"); H->table_size = next_prime( table_size ); H->the cells = (cell *) malloc ( sizeof ( cell ) * H->table_size ); if( H->the_cells == NULL ) fatal_error("Out of space!!!"); for(i=0; i<H->table_size; i++ ) H->the_cells[i].info = empty; return H; } Find routine for closed hashing with quadratic probing Position find( element_type key, hashtable H ) { position i, current_pos; i = 0; current_pos = hash( key, H->table_size ); while( (H->the_cells[current_pos].element != key ) && (H-> the_cells[current_pos] .info != empty ) ) {
  • 26. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V current_pos += 2*(++i) - 1; if( current_pos >= H->table_size ) current_pos -= H->table_size; } return current_pos; } Insert routine for closed hash tables with quadratic probing Void insert( element_type key, hashtable H ) { position pos; pos = find( key, H ); if( H->the_cells[pos].info != legitimate ) { /* ok to insert here */ H->the_cells[pos].info = legitimate; H->the_cells[pos].element = key; } } Double Hashing The last collision resolution method we will examine is double hashing. For double hashing, one popular choice is f(i) = i h2 (x). This formula says that we apply a second hash function to x and probe at a distance h2 (x), 2 h2 (x), . . ., and so on. A function such as h2 (x) = R - (x mod R), with R a prime smaller than H_SIZE, will work well. If we choose R = 7, then below Figure shows the results of inserting the same keys as before.
  • 27. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V Closed hash table with double hashing, after each insertion. The first collision occurs when 49 is inserted. h2 (49) = 7 - 0 = 7, so 49 is inserted in position 6. h2 (58) = 7 - 2 = 5, so 58 is inserted at location 3. Finally, 69 collides and is inserted at a distance h2 (69) = 7 - 6 = 1 away. If we tried to insert 60 in position 0, we would have a collision. Since h2 (60) = 7 - 4 = 3, we would then try positions 3, 6, 9, and then 2 until an empty spot is found. Rehashing If the table gets too full, the running time for the operations will start taking too long and inserts might fail for closed hashing with quadratic resolution. This can happen if there are too many deletions intermixed with insertions. A solution, then, is to build another table that is about twice as big and scan down the entire original hash table, computing the new hash value for element and inserting it in the new table. As an example, suppose the elements 13, 15, 24, and 6 are inserted into a closed hash table of size 7. The hash function is h(x) = x mod 7. Suppose linear probing is used to resolve collisions.
  • 28. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V The resulting hash table appears in below figure If 23 is inserted into the table, the resulting table in below figure will be over 70 percent full. Because the table is so full, a new table is created. The size of this table is 17, because this is the first prime which is twice as large as the old table size. The new hash function is then h(x) = x mod 17. The old table is scanned, and elements 6, 15, 23, 24, and 13 are inserted into the new table. The resulting table appears as below.
  • 29. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V This entire operation is called rehashing. This is obviously a very expensive operation – the running time is O(n). Rehashing routines Hashtable rehash( HASH_TABLE H ) { unsigned int i, old_size; cell *old_cells; old_cells = H->the_cells; old_size = H->table_size; /* Get a new, empty table */ H = initialize_table( 2*old_size ); /* Scan through old table, reinserting into new */ for( i=0; i<old_size; i++ ) if( old_cells[i].info == legitimate ) insert( old_cells[i].element, H ); free( old_cells ); return H; } Extendible Hashing If the amount of data is too large to fit in main memory, then is the number of disk accesses required to retrieve data. As before, we assume that at any point we have n records to store; the value of n changes over time. Furthermore, at most m records fit in one disk block. We will use m = 4 in this section.
  • 30. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V If either open hashing or closed hashing is used, the major problem is that collisions could cause several blocks to be examined during a find, even for a well-distributed hash table. Furthermore, when the table gets too full, an extremely expensive rehashing step must be performed, which requires O(n) disk accesses. A clever alternative, known as extendible hashing, allows a find to be performed in two disk accesses. Insertions also require few disk accesses. If the time to perform this step could be reduced, then we would have a practical scheme. This is exactly the strategy used by extendible hashing. Let us suppose, for the moment, that our data consists of several six-bit integers. The root of the "tree" contains four pointers determined by the leading two bits of the data. Each leaf has up to m = 4 elements. It happens that in each leaf the first two bits are identical; this is indicated by the number in parentheses. To be more formal, D will represent the number of bits used by the root, which is sometimes known as the directory. The number of entries in the directory is thus 2D . dL is the number of leading bits that all the elements of some leaf have in common. dL will depend on the particular leaf, and dL<=D. Extendible hashing: original data
  • 31. 191GES205T -PROGRAMMING AND DATA STRUCTURES USING C UNIT-V Suppose that we want to insert the key 100100. This would go into the third leaf, but as the third leaf is already full, there is no room. We thus split this leaf into two leaves, which are now determined by the first three bits. This requires increasing the directory size to 3. If the key 000000 is now inserted, then the first leaf is split, generating two leaves with dL = 3. Since D = 3, the only change required in the directory is the updating of the 000 and 001 pointers. ***************************************ALL THE BEST*************************************