Chapter 8 advanced sorting and hashing for print

Chapter 7 Advanced Sorting, Searching and Hashing
Shell sort
Shell sort is an improvement of insertion sort. It is developed by Donald Shell in 1959.
Insertion sort works best when the array elements are sorted in a reasonable order. Thus,
shell sort first creates this reasonable order.
Algorithm:
1. Choose gap gk between elements to be partly ordered.
2. Generate a sequence (called increment sequence) gk, gk-1,…., g2, g1 where for each
sequence gi, A[j]<=A[j+gi] for 0<=j<=n-1-gi and k>=i>=1
It is advisable to choose gk =n/2 and gi-1 = gi/2 for k>=i>=1. After each sequence gk-1 is
done and the list is said to be gi-sorted. Shell sorting is done when the list is 1-sorted
(which is sorted using insertion sort) and A[j]<=A[j+1] for 0<=j<=n-2. Time complexity
is O(n3/2
).
Also known as diminishing increment sort
Easier to sort an array if parts of it already sorted
General outline:
Divide data into h subarrays
for (i = 1; i < h; i++)
Sort subarray i
Sort array data
How to divide into subarrays?
 Could use contiguous elements, e.g.
[abcdef] => [abc] and [def]
 Shell sort uses regularly spaced elements, e.g.
[abcdef] => [ace] and [bdf]
Shell sort uses several iterations of this technique:
 First, large number of small subarrays
 Next, smaller number of larger subarrays
 Etc, until entire array is sorted

2
 E.g. example application:
(n-sort means divide array into n subarrays consisting of elements n positions apart)
data before 5-sort 10 8 6 20 4 3 22 1 0 15 16
5 subarrays
before 5-sort
10 3 16
8 22
6 1
20 0
4 15
5 subarrays after
5-sort
3 10 16
8 22
1 6
0 20
4 15
data after 5-sort
and before 3-sort
3 8 1 0 4 10 22 6 20 15 16
3 subarrays
before 3-sort
3 0 22 15
8 4 6 16
1 10 20
3 subarrays after
3-sort
0 3 15 22
4 6 8 16
1 10 20
data after 3-sort
and before 1-sort
0 4 1 3 6 10 15 8 20 22 16
data after 1-sort 0 1 3 4 6 8 10 15 16 20 22
 Notice that initially all sorts are of small arrays, and the larger
arrays are almost sorted already
 In this example, there are 3 iterations:
5-sort, 3-sort and 1-sort
Known as the diminishing increment
 How do we decide on the values for diminishing increment?
No definitive answer
But the following definition works well:
o h1 = 1
o hi+1 = 3hi + 1
Leads to sequence 3280, 1093, 364, 121, 40, 13, 4, 1
 What sorting algorithm to use at each pass?
No definitive answer
E.g. insertion sort for all, insertion sort for all and bubble
sort for last, etc.
 Complexity:
Analytical analysis is difficult
Experimental analysis has shown complexity is approximately O(n1.25
)

3
Example: Sort the following list using shell sort algorithm.
5 8 2 4 1 3 9 7 6 0
Choose g3 =5 (n/2 where n is the number of elements =10)
Sort (5, 3) 3 8 2 4 1 5 9 7 6 0
Sort (8, 9) 3 8 2 4 1 5 9 7 6 0
Sort (2, 7) 3 8 2 4 1 5 9 7 6 0
Sort (4, 6) 3 8 2 4 1 5 9 7 6 0
Sort (1, 0) 3 8 2 4 0 5 9 7 6 1
 5- sorted list 3 8 2 4 0 5 9 7 6 1
Choose g2 =3
Sort (3, 4, 9, 1) 1 8 2 3 0 5 4 7 6 9
Sort (8, 0, 7) 1 0 2 3 7 5 4 8 6 9
Sort (2, 5, 6) 1 0 2 3 7 5 4 8 6 9
 3- sorted list 1 0 2 3 7 5 4 8 6 9
Choose g1 =1 (the same as insertion sort algorithm)
Sort (1, 0, 2, 3, 7, 5, 4, 8, 6, 9) 0 1 2 3 4 5 6 7 8 9
 1- sorted (shell sorted) list 0 1 2 3 4 5 6 7 8 9

4
Heap sort
Heap sort operates by first converting the list in to a heap tree. Heap tree is a binary tree in
which each node has a value greater than both its children (if any). It uses a process called
"adjust to accomplish its task (building a heap tree) whenever a value is larger than its parent.
The time complexity of heap sort is O(nlogn).
Algorithm:
1. Construct a binary tree
The root node corresponds to Data[0].
If we consider the index associated with a particular node to be i, then the left child of this
node corresponds to the element with index 2*i+1 and the right child corresponds to the
element with index 2*i+2. If any or both of these elements do not exist in the array, then
the corresponding child node does not exist either.
2. Construct the heap tree from initial binary tree using "adjust" process.
3. Sort by swapping the root value with the lowest, right most value and deleting the
lowest, right most value and inserting the deleted value in the array in it proper position.
Example: Sort the following list using heap sort algorithm.
5 8 2 4 1 3 9 7 6 0
Swap the root node with the lowest, right most node and delete the lowest, right most value;
insert the deleted value in the array in its proper position; adjust the heap tree; and repeat
this process until the tree is empty.
5
RootNodePtr
8
4
7
1
06
2
93
9
RootNodePtr
8
7
4
1
06
5
23
Construct the initial binary tree Construct the heap tree
0
RootNodePtr
8
7
4
1
6
5
23
9
9
RootNodePtr
8
7
4
1
06
5
23

5
8 9
0
RootNodePtr
7
6
4
1
5
23
7 8 9
0
RootNodePtr
6
4 1
5
23
7
RootNodePtr
6
4
0
1
5
23
6 7 8 9
2
RootNodePtr
4
0 1
5
3
6
RootNodePtr
4
0 1
5
23
5 6 7 8 9
2
RootNodePtr
4
0 1
3
8
RootNodePtr
7
6
4
1
0
5
23
5
RootNodePtr
4
0 1
3
2
4
RootNodePtr
2
0 1
3
4 5 6 7 8 9
1
RootNodePtr
2
0
3

6
Recall that a heap is a binary tree with 2 properties
The value of each node is greater than or equal to the values stored in each of its
children.
The tree is perfectly balanced, and the leaves in the last level are all in the leftmost
positions
Therefore the largest element is always at root
Heap sort works by building elements into a heap, then successively removing the largest
element
Pseudocode: (uses array implementation of heap)
HeapSort(data):
Transform data into a heap
for (i = n-1; i > 1; i--)
Swap the root with element in position i in array
Restore the heap property for the tree data[0] … data[i-1]
 E.g. example application
3
RootNodePtr
2
0
1 3 4 5 6 7 8 9 0
RootNodePtr
2 1
2
RootNodePtr
0 1
2 3 4 5 6 7 8 9
1
RootNodePtr
0
1
RootNodePtr
0
1 2 3 4 5 6 7 8 9
0
RootNodePtr
0
RootNodePtr 0 1 2 3 4 5 6 7 8 9
RootNodePtr

7
Complexity:
Data movements, worst case:
o Constructing the heap is O(n)
(can be shown)
o Restoring the heap conditions after the largest element is swapped:
 At worst, log i swaps, where i is number of nodes in heap
 Therefore total number of swaps over n – 1 iterations is
1
1
2 )log(log
n
i
nnOi
(can be shown)
Data movements, best case:
o Heap consists of identical elements
o Constructing the heap does not involve any moves
(heap conditions are already met)
o Second phase has n – 1 swaps
o Therefore O(n) in best base
Comparisons are similar
o Worst case: 1 comparison for each swap, so O(n
log n)

8
o Best case: 1 comparison for each iteration, so O(n)
o Average cases can be shown to be
O(n log n)

9
Quicksort
Quick sort is the fastest known algorithm. It uses divide and conquer strategy and in the worst
case its complexity is O (n2). But its expected complexity is O(nlogn).
Algorithm:
1. Choose a pivot value (mostly the first element is taken as the pivot value)
2. Position the pivot element and partition the list so that:
the left part has items less than or equal to the pivot value
the right part has items greater than or equal to the pivot value
3. Recursively sort the left part
4. Recursively sort the right part
The following algorithm can be used to position a pivot value and create partition.
Most famous and widely used sorting algorithm
Basic principle is similar to shell sort:
Quicker to sort a number of smaller arrays than to sort one big one
Algorithm:
First divide array into 2 subarrays: one with values <= a bound, one with values >= a
bound
 Bound is one of the array elements
Next, each of these subarrays is divided into 2 subarrays, based on new bounds
Etc, until subarrays consist of 1 element
Then subarrays are put back together until complete sorted array results
Left=0;
Right=n-1; // n is the total number of elements in the list
PivotPos=Left;
while(Left<Right)
{
if(PivotPos==Left)
{
if(Data[Left]>Data[Right])
{
swap(data[Left], Data[Right]);
PivotPos=Right;
Left++;
}
else
Right--;
}
else
{
if(Data[Left]>Data[Right])
{
swap(data[Left], Data[Right]);
PivotPos=Left;
Right--;
}
else
Left++;
}

10
}
5 8 2 4 1 3 9 7 6 0
Example: Sort the following list using
quick sort algorithm.
Pivot
Left
0 3 2 4 1 5 9 7 6 8
Right
Pivot
Left Right
5 8 2 4 1 3 9 7 6 0
Pivot
Left Right
0 8 2 4 1 3 9 7 6 5
Pivot
Left Right
0 5 2 4 1 3 9 7 6 8
Pivot
Left Right
0 5 2 4 1 3 9 7 6 8
Pivot
Left Right
0 5 2 4 1 3 9 7 6 8
Pivot
Left
0 5 2 4 1 3 9 7 6 8
Right
Pivot
Left
0 5 2 4 1 3 9 7 6 8
Right
Pivot
Left
0 5 2 4 1 3 9 7 6 8
Right
Pivot
Left
0 3 2 4 1 5 9 7 6 8
Right
Pivot
Left
0 3 2 4 1 5 9 7 6 8
Right
5
Pivot
Left
0 3 2 4 1
Right
9 7 6 8
Left
Pivot
Right
8 7 6 9
Left
Pivot
Right
5
Pivot
Left
0 3 2 4 1
Right
8 7 6 9
Left
Pivot
Right
5
Pivot
Left
0 3 2 4 1
Right
8 7 6 9
Left
Pivot
Right
5
Pivot
Left
0 3 2 4 1
Right
6 7 8 9
Left
Pivot
Right
5
Pivot
Left
0 3 2 4 1
Right
6 7 8 9
Left
Pivot
RightRight
5
Pivot
Left
0 1 2 4 3
Right
6 7 8 95
Pivot
Left
0 1 2 4 3
Right
6 7 8 95
Pivot
Left
0 1 2 3 4
Right
6 7 8 95
Pivot
Left
0 1 2 3 4
Right
6 7 8 950 1 2 3 4

11
Quicksort(data):
If length of data > 1
Choose a bound
While there are elements left in data
Include element either in subarray1 (if element ≤ bound)
or in subarray2 (if element ≥ bound)
Quicksort(subarray1)
Quicksort(subarray2)
How to find right bound?
Need a bound that divides array into 2 approximately equal halves
Choose first element?
 If array already sorted will not lead to good bound
Choose middle element?
 A more common approach
Use a random element?
 Random number generators can be slow
Use median or first, middle and last?
 E.g. [1 5 4 7 8 6 7 3 8 12 10], 6 is chosen from the set [1 6 10]
 E.g. example application: (uses middle element for bound)

12
Complexity:
Worst case:
o Largest (or smallest) element always chosen as
bound
o Algorithm operates on arrays of size n, n-1, n-2, ...
, 2
o Partitioning requires n-1, n-2, n-3, ... 1
comparisons
o Therefore number of comparisons is n(n – 1)/2,
which is O(n2
)
Best case:
o Array is split into 2 halves at each iteration
o Sizes of arrays: n, n/2, n/4, etc
o Therefore number of comparisons for partitioning
is
)1(log
8
8
4
4
2
2 2 nn
n
n
n
nnn
n 
 (Each term is equal to n, and there are log2
n + 1 terms)
o So best case is O(n log n)
Average case
o Calculations show that this is also O(n log n)
o Data movements similar (1 movement for each
comparison to move element into new array)

13
Mergesort
Like quick sort, merge sort uses divide and conquer strategy and its time complexity is
O(nlogn).
Algorithm:
1. Divide the array in to two halves.
2. Recursively sort the first n/2 items.
3. Recursively sort the last n/2 items.
4. Merge sorted items (using an auxiliary array).

14
Example: Sort the following list using merge sort algorithm.
5 8 2 4 1 3 9 7 6 0
Quicksort has poor performance in worst case - O(n2
)
This is caused by uneven partitioning
Mergesort attempts to overcome this by ensuring even partitioning
Algorithm:
Partition array into two halves, irrespective of values
Partition each subarray into two halves
Etc. until arrays consist of 1 element each
Then start to merge the subarrays back together
Etc. until complete sorted array results
Like quicksort, inherently recursive
In Quicksort, the work is done in dividing, in Mergesort the work is done in merging
Pseudocode:
5 8 2 4 1 3 9 7 6 0
5 8 2 4 1 3 9 7 6 0
5 8 2 4 1 3 9 7 6 0
5 8 2 4 1 3 9 7 6 0
4 1 6 0
1 4
5 8 1 2 4 3 9 0 6 7
0 6
1 2 4 5 8 0 3 6 7 9
0 1 2 3 4 5 6 7 8 9
Division phase
Sorting and merging phase

15
Mergesort(data):
If data has at least two elements
Mergesort(left half of data)
Mergesort(right half of data)
Merge the left and right halves back together again
 E.g. example application
Complexity:
In implementation, the merging of two subarrays works by copying two subarrays
element by element into a temporary array, then copying from temporary array
back to main array
o 2n assignments
Therefore total assignments for mergesort is
n
n
MnM
M
2)
2
(2)(
0)1(
Therefore can calculate total assignments as

16
n
n
Mn
nn
MnM 4
4
42
2
2
4
22)(
ni
n
M
n
n
Mn
nn
M
i
i
.2
2
2
6
8
84
4
2
8
24

If we subdivide an array of n elements, it will be log2 n iterations until we get
arrays of size 1
Therefore choose i = log2 n, so that n = 2i
…
)log(log2log2)1(..2
2
2)( 22 nnOnnnnMnni
n
MnM i
i
Best, average and worst cases the same
Comparisons are similar (same big-O)

Chapter 8 advanced sorting and hashing for print

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Chapter 8 advanced sorting and hashing for print

Similar to Chapter 8 advanced sorting and hashing for print (20)

More from Abdii Rashid

More from Abdii Rashid (10)

Recently uploaded

Recently uploaded (20)

Chapter 8 advanced sorting and hashing for print