2. Binary Search Trees / Slide 2
Trees
Linear access time of linked lists is prohibitive
Does there exist any simple data structure for
which the running time of most operations (search,
insert, delete) is O(log N)?
3. Binary Search Trees / Slide 3
Trees
A tree is a collection of nodes
The collection can be empty
(recursive definition) If not empty, a tree consists of
a distinguished node r (the root), and zero or more
nonempty subtrees T1, T2, ...., Tk, each of whose
roots are connected by a directed edge from r
4. Binary Search Trees / Slide 4
Some Terminologies
Child and parent
Every node except the root has one parent
A node can have an arbitrary number of children
Leaves
Nodes with no children
Sibling
nodes with same parent
5. Binary Search Trees / Slide 5
Some Terminologies
Path
Length
number of edges on the path
Depth of a node
length of the unique path from the root to that node
The depth of a tree is equal to the depth of the deepest leaf
Height of a node
length of the longest path from that node to a leaf
all leaves are at height 0
The height of a tree is equal to the height of the root
Ancestor and descendant
Proper ancestor and proper descendant
7. Binary Search Trees / Slide 7
Binary Trees
A tree in which no node can have more than two children
The depth of an “average” binary tree is considerably smaller
than N, eventhough in the worst case, the depth can be as large
as N – 1.
8. Binary Search Trees / Slide 8
Example: Expression Trees
Leaves are operands (constants or variables)
The other nodes (internal nodes) contain operators
Will not be a binary tree if some operators are not binary
9. Binary Search Trees / Slide 9
Tree traversal
Used to print out the data in a tree in a certain
order
Pre-order traversal
Print the data at the root
Recursively print out all data in the left subtree
Recursively print out all data in the right subtree
10. Binary Search Trees / Slide 10
Preorder, Postorder and Inorder
Preorder traversal
node, left, right
prefix expression
++a*bc*+*defg
15. Binary Search Trees / Slide 15
Binary Trees
Possible operations on the Binary Tree ADT
parent
left_child, right_child
sibling
root, etc
Implementation
Because a binary tree has at most two children, we can keep
direct pointers to them
17. Binary Search Trees / Slide 17
Binary Search Trees
Stores keys in the nodes in a way so that searching,
insertion and deletion can be done efficiently.
Binary search tree property
For every node X, all the keys in its left subtree are smaller
than the key value in X, and all the keys in its right subtree
are larger than the key value in X
18. Binary Search Trees / Slide 18
Binary Search Trees
A binary search tree Not a binary search tree
19. Binary Search Trees / Slide 19
Binary search trees
Average depth of a node is O(log N);
maximum depth of a node is O(N)
Two binary search trees representing
the same set:
21. Binary Search Trees / Slide 21
Searching BST
If we are searching for 15, then we are done.
If we are searching for a key < 15, then we
should search in the left subtree.
If we are searching for a key > 15, then we
should search in the right subtree.
23. Binary Search Trees / Slide 23
Searching (Find)
Find X: return a pointer to the node that has key X, or
NULL if there is no such node
Time complexity
O(height of the tree)
24. Binary Search Trees / Slide 24
Inorder traversal of BST
Print out all the keys in sorted order
Inorder: 2, 3, 4, 6, 7, 9, 13, 15, 17, 18, 20
25. Binary Search Trees / Slide 25
findMin/ findMax
Return the node containing the smallest element in
the tree
Start at the root and go left as long as there is a left
child. The stopping point is the smallest element
Similarly for findMax
Time complexity = O(height of the tree)
26. Binary Search Trees / Slide 26
insert
Proceed down the tree as you would with a find
If X is found, do nothing (or update something)
Otherwise, insert X at the last spot on the path traversed
Time complexity = O(height of the tree)
27. Binary Search Trees / Slide 27
delete
When we delete a node, we need to consider
how we take care of the children of the
deleted node.
This has to be done such that the property of the
search tree is maintained.
28. Binary Search Trees / Slide 28
delete
Three cases:
(1) the node is a leaf
Delete it immediately
(2) the node has one child
Adjust a pointer from the parent to bypass that node
29. Binary Search Trees / Slide 29
delete
(3) the node has 2 children
replace the key of that node with the minimum element at the
right subtree
delete the minimum element
Has either no child or only right child because if it has a left
child, that left child would be smaller and would have been
chosen. So invoke case 1 or 2.
Time complexity = O(height of the tree)
30. Binary Search Trees / Slide 30
Extended Binary Tree
A binary tree in which special nodes are added wherever a null
subtree was present in the original tree so that each node in the
original tree (except the root node) has degree three
(Knuth 1997, p. 399).
31. Binary Search Trees / Slide 31
Binary Search Trees
A binary tree:
No node has more than two child nodes (called
child subtrees).
Child subtrees must be differentiated, into:
Left-child subtree
Right-child subtree
A search tree:
For every node, p:
All nodes in the left subtree are < p
All nodes in the right subtree are > p
32. Binary Search Trees / Slide 32
Binary Search Tree - Example
Alex
Alex
Angela
Angela
Abner
Abner
Abigail
Abigail Adela
Adela
Adam
Adam Agnes
Agnes
Alice
Alice
Allen
Allen
Audrey
Audrey
Arthur
Arthur
33. Binary Search Trees / Slide 33
Binary Search Trees (cont)
Searching for a value is in a tree of N nodes
is:
O(log N) if the tree is “balanced”
O(N) if the tree is “unbalanced”
34. Binary Search Trees / Slide 34
“Unbalanced” Binary Search Trees
Below is a binary search tree that is NOT
“balanced”
Alex
Alex
Angela
Angela
Abner
Adam
Abigail
Abigail
Adela
Adela
Adam
Abner
Agnes
Agnes
Alice
Alice
Allen
Allen
Audrey
Audrey
Arthur
Arthur
35. Binary Search Trees / Slide 35
Properties of Binary Trees
A binary tree is a full binary tree if and only if:
Each non leaf node has exactly two child nodes
All leaf nodes have identical path length
It is called full since all possible node slots
are occupied
37. Binary Search Trees / Slide 37
Full Binary Trees
A Full binary tree of height h will have how
many leaves?
A Full binary tree of height h will have how
many nodes?
38. Binary Search Trees / Slide 38
Complete Binary Trees
A complete binary tree (of height h) satisfies
the following conditions:
Level 0 to h-1 represent a full binary tree of height
h-1
One or more nodes in level h-1 may have 0, or 1
child nodes
If j,k are nodes in level h-1, then j has more child
nodes than k if and only if j is to the left of k
39. Binary Search Trees / Slide 39
Complete Binary Trees - Example
B
B
A
A
C
C
D
D E
E
H
H I
I J
J K
K
F
F G
G
Figure 13.8 A complete binary tree
Figure 13.8 A complete binary tree
40. Binary Search Trees / Slide 40
Complete Binary Trees (cont)
Given a set of N nodes, a complete binary
tree of these nodes provides the maximum
number of leaves with the minimal average
path length (per node)
The complete binary tree containing n nodes
must have at least one path from root to leaf
of length log n
41. Binary Search Trees / Slide 41
Height-balanced Binary Tree
A height-balanced binary tree is a binary tree
such that:
The left & right subtrees for any given node differ in
height by no more than one
Note: Each complete binary tree is a height-
balanced binary tree
42. Binary Search Trees / Slide 42
Height-balanced Binary Tree - Example
N M
N-M<=1
Height balanced is a local property
43. Binary Search Trees / Slide 43
Advantages of Height-balanced Binary Trees
Height-balanced binary trees are “balanced”
Operations that run in time proportional to the
height of the tree are O(log n), n the number
of nodes with limited performance variance
Variance is a very important concern in real
time applications, e.g. connecting calls in a
telephone network
86. Searching
Interenal searching External searching
Linear search Non-linera search
Sequential search
Binary search
Interpolation search
Search with key-comparison Search without key-comparison
Tree search
Binary search tree
Red-black tree search
Splay tree search
Multi-way tree search
Digital search
Graph search
AVL tree search
m-way tree search
B-tree search
Depth first search
Breadth first search
Adress calculation search
B tree searching
B+ tree searching
89. Flowchart: Sequential Search with Array
Start
i = 0
K = A[i]?
Print "Successful"
Print "Unsuccessful"
i = i+1
i ≥ n
Stop
Yes
Yes
No
No
90. Example: Sequential Search with Array
int main()
{
int A[10], i, n, K, flag = 0;
printf("Enter the size of an array: ");
scanf("%d",&n);
printf("Enter the elements of the array: ");
for(i=0; i < n; i++)
scanf("%d",&A[i]);
printf("Enter the number to be searched: ");
scanf("%d",&K);
for(i=0;i<n;i++){
if(a[i] == K){
flag = 1; break;
}
}
if(flag == 0)
printf("The number is not in the list");
else
printf("The number is found at index %d",i);
return 0;
}
91. Complexity Analysis
• Case 1: The key matches with the first element
• T(n) = 1
• Case 2: Key does not exist
• T(n) = n
• Case 3: The key is present at any location in the array
n
i
i i
p
n
T
1
)
(
n
p
p
p
p n
i
1
2
1
n
i
i
n
n
T
1
1
)
(
2
1
)
(
n
n
T
92. Complexity Analysis : Summary
Case
Number of key
comparisons
Asymptotic
complexity
Remark
Case 1 T(n) = 1 T(n) = O(1) Best case
Case 2 T(n) = n T(n) = O(n) Worst case
Case 3 T(n) = O(n) Average case
2
1
)
(
n
n
T
94. l u
mid = (l+u)/2
(a) An ordered array of elemnets with index values l, u and mid
l u
mid
(b) Search the entire list turns into the searching of left-half only
u = mid-1
Serach this half the same way
if K < A[mid]
l u
mid
l = mid+1 Serach this half the same way
if K > A[mid]
(c) Search the entire list turns into the searching of right-half only
The Technique
95. Flowchart: Binary Search with Array
mid = (l+u)/2
K = A[mid]?
Start
Search is successful
YES NO
K < A[mid]?
YES
u = mid-1 l = mid+1
NO
Stop
(l>u)?
Start
Search is unsuccessful
YES
NO
96. Binary Search (with Iteration)
#include <stdio.h>
int main()
{
int i, l, u, mid, n, K, data[100];
printf("Enter number of elementsn");
scanf("%d",&n);
printf("Enter %d integers in sorted ordern", n);
for (i = 0; i < n; i++)
scanf("%d",&array[i]);
printf("Enter value to findn");
scanf("%d", &K);
l = 0;
u = n - 1;
mid = (l+u)/2;
Contd…
97. Binary Search (with Iteration)
while (l <= u) {
if (data[mid] < K)
l = mid + 1;
else if (data[mid] == K) {
printf("%d found at location %d.n", search, mid+1);
break;
}
else
u = mid - 1;
mid = (l + u)/2;
}
if (l > u)
printf("Not found! %d is not present in the list.n", K);
return 0;
}
98. Binary Search (with Recursion)
#include<stdio.h>
int main(){
int data[100],i, n, K, flag, l, u;
printf("Enter the size of an array: ");
scanf("%d",&n);
printf("Enter the elements of the array in sorted order: " );
for(i=0;i<n;i++)
scanf("%d",&a[i]);
printf("Enter the number to be search: ");
scanf("%d",&K);
l=0,u=n-1;
flag = binarySearch(data,n,K,l,u);
if(flag==0)
printf("Number is not found.");
else
printf("Number is found.");
return 0;
}
Contd…
101. Complexity Analysis: Binary Search
5
F
2 8
9
6
3
1
7
4
F
F
F
F
F
F
F
F F
<
<
< <
<
<
<
<
<
=
=
= =
=
= =
=
=
>
> >
>
>
> >
> >
10
F
< = >
Let n be the total number of elements in the list under search and there exist an integer k such that:-
• For successful search:-
• If , then the binary search algorithm requires at least one comparison and at most k comparisons.
• For unsuccessful search:-
• If , then the binary search algorithm requires k comparisons.
• If , then the binary search algorithm requires either k-1 or k number of comparisons.
k
k
n 2
2 1
1
2
k
n
1
2
2 1
k
k
n
103. Complexity Analysis: Binary Search
•Average Case
• Successful search:-
• Unsuccessful search:-
1
)
(
n
I
n
T
1
2
log
log
)
( 2
2
n
n
n
n
n
T
1
)
(
'
n
E
n
T
1
2
log
)
(
' 2
n
n
n
T
105. Interpolation Search
1. l = 1, u = n // Initialization: Range of searching
2. flag = FALSE // Hold the status of searching
3. While (flag = FALSE) do
4. l
l
u
l
A
u
A
l
A
K
loc
]
[
]
[
]
[
5. If ( )
u
loc
l
then // If loc is within the range of the list
6. Case: K < A[loc]
7. u = loc -1
8. Case: K = A[loc]
9. flag = TRUE
10. Case: K > A[loc]
11. l = loc +1
12. Else
13. Exit()
14. EndIf
15. EndWhile
16. If (flag) then
17. Print “Successful at” loc
18. Else
19. Print “Unsuccessful”
20. EndIf
21. Stop
110. Sequential Search with Linked List
DATA LINK
(a) Structure of a node in the linked list
(b) Linear search on a linked list
H
Header
Search begins
here
Search at an intermediate node:
Search stops here if key matches
else move to its immediate next node
Search unsuccessfully ends here
111. Flow Chart: Sequential Search with LL
temp = temp->next
Start
while
temp != NULL
temp = header
temp->data ==
key
Print Success
Return temp
Print Unsuccessful
Y
N
T
112. Example: Sequential Search with LL
#include <stdio.h>
#include <stdlib.h>
struct node
{
int data;
struct node *next;
};
int main()
{
struct node *header = NULL;
int K, n;
printf("Enter the number of nodes: ");
scanf("%d", &n);
printf("nDisplaying the listn");
generate(header, num);
printf("nEnter key to search: ");
scanf("%d", &key);
searchBinary(header, K);
delete(header);
return 0;
}
113. Example: Linear Search with LL
void generate(struct node *head, int n)
{
int i;
struct node *temp;
for (i = 0; i < num; i++)
{
temp = (struct node *)malloc(sizeof(struct node));
temp->data = rand() % n;
if (*header == NULL)
{
*header = temp;
temp->next = NULL;
}
else
{
temp->next = header;
header = temp;
}
printf("%d ", temp->data);
}
}
114. Example: Linear Search with LL
void searchBinary(struct node *temp, int K)
{
while (temp != NULL)
{
if (temp->data == K)
{
printf("key foundn");
return;
}
else temp = temp->next;
}
printf("Key not foundn");
}
void delete(struct node *header)
{
struct node *temp;
temp = header;
while (temp != NULL)
{
temp = temp->next;
free(header);
header = temp;
}
}
115. Complexity Analysis
Case
Number of key
comparisons
Asymptotic complexity Remark
Case 1 T(n) = 1 T(n) = O(1) Best case
Case 2 T(n) = O(n) Average case
Case 3 T(n) = n T(n) = O(n) Worst case
2
1
)
(
n
n
T
116.
117. Sorting
• Sorting means arranging the elements of an array so that they
some relevant order which may be either ascending or descen
• That is, if A is an array, then the elements of A are arrange
order (ascending order) in such a way that A[0] < A[1] < A[2] <
• The practical considerations for different sorting techniques w
• Number of sort key comparisons that will be performed.
• Number of times the records in the list will be moved.
• Best, average and worst case performance.
• Stability of the sorting algorithm where stability means that equival
records retain their relative positions even after sorting is done.
119. Insertion Sort
• In insertion sort, the sorted array is built one at a time.
• The main idea behind insertion sort is that it inserts each
proper place in the final list.
• To save memory, most implementation of the insertion sort a
by moving the current data element past the already sorte
repeatedly interchanging it with the preceding value until it i
place.
120. Insertion Sort
• Technique:
• The array of values to be sorted is divided into two sets. One that stor
and another that contains unsorted values.
• The sorting algorithm will proceed until there are elements in the unsort
• Suppose there are n elements in the array. Initially, the element with in
LB = 0) is in the sorted set. Rest of the elements are in the unsorted set.
• The first element of the unsorted partition has array index 1 (if LB = 0)
• During each iteration of the algorithm, the first element in the unsorted
and inserted into the correct position in the sorted set.
121. Insertion Sort
• Procedure INSERTION_SORT (K, N) : Given an unordered vector K consisting of N
rearranges the vector in ascending order. The sorting process is based on the techniques ju
PASS denotes the pass index and the position of the element that needs to be compare
elements.TEMP is variable that holds the value which needs to be compared with previou
1. [Loop on pass index]
1. Repeat thru step 5 for PASS = 1 to N - 1
2. [Initialize a temporary variable to compare with]
1. TEMP K[PASS]
3. [Initialize variable for comparing with sorted array]
1. J PASS – 1
4. [Compare]
1. Repeat while TEMP ≤ K[J] AND J ≥ 0
1. K[J+1] K[J]
2. J J - 1
5. [Move the minimum value el
1. K[J+1] TEMP
6. [Finished]
1. Return
122. Analysis of Insertion Sort
• The best case input is an array that is already sorted. In this
sort has a linear running time (i.e., O(n)). During each itera
remaining element of the input is only compared with th
element of the sorted subsection of the array.
• The simplest worst case input is an array sorted in reverse ord
all worst case inputs consists of all arrays where each el
smallest or second-smallest of the elements before it. In thes
iteration of the inner loop will scan and shift the entire sorted
the array before inserting the next element. This gives in
quadratic running time (i.e., O(n2)).
• The average case is when the array is randomly sorted, still at
1) / 2 comparisons needs to be done and hence quadratic run
O(n2).
124. Selection Sort
• The algorithm divides the input list into two parts:
• the sublist of items already sorted, which is built up from left to right a
of the list,
• and the sublist of items remaining to be sorted that occupy the rest of th
• Initially, the sorted sublist is empty and the unsorted sublist
input list.
• The algorithm proceeds by finding the smallest (or largest, d
sorting order) element in the unsorted sublist, exchanging
leftmost unsorted element (putting it in sorted order), and
sublist boundaries one element to the right.
125. Selection Sort
• Technique:
• Consider an array ARR with N elements. The selection sort takes N-1 p
entire array and works as follows. First find the smallest value in the arr
the first position. Then find the second smallest value in the array an
second position. Repeat this procedure until the entire array is sorted.Th
• In Pass 1, find the position POS of the smallest value in the array
ARR[POS] andARR[0].Thus, ARR[0] is sorted.
• In Pass 2, find the position POS of the smallest value in sub-array of N-1
ARR[POS] with ARR[1]. Now,A[0] andA[1] is sorted
• In Pass 3, find the position POS of the smallest value in sub-array of N-2
ARR[POS] with ARR[2]. Now,ARR[0],ARR[1] andARR[2] is sorted.
• In Pass N-1, find the position POS of the smaller of the elements ARR[N
1]. SwapARR[POS] and ARR[N-2] so that ARR[0],ARR[1], … , ARR[N-1] i
126. Selection Sort
• Procedure SELECTION_SORT (K, N) : Given an unordered vector K consisting of N elements
the vector in ascending order. The sorting process is based on the techniques just described.
the pass index and the position of the first element in the vector which is to be examined d
variable MIN_INDEX denotes the position of the smallest element encountered thus far in a pa
is used to index elements K[PASS] to K[N] in a given pass.
1. [Loop on pass index]
1. Repeat thru step 4 for PASS = 1 to N - 1
2. [Initialize minimum index]
1. MIN_INDEX PASS
3. [Make a pass and obtain element with smallest value]
1. Repeat for J = PASS + 1, … , N
1. If K[J] < K[MIN_INDEX]
2. Then MIN_INDEX J
4. [Exchange elements]
1. If MIN_INDEX ≠ PASS
2. Then K[PASS] K[MIN_INDEX]
5. [Finished]
1. Return
127. Analysis of Selection Sort
• Selection sort is a sorting algorithm that is independent o
order of the elements in the array. In pass 1, selecting the
smallest value calls for scanning all n elements; thus, n-1 com
required in the first pass. Then, the smallest value is swap
element in the first position. In pass 2, selecting the second s
requires scanning the remaining n − 1 elements and so on.The
• (n − 1) + (n − 2) + ... + 2 + 1 = n(n − 1) / 2 = O(n2) comparisons
• So, for best, average and worst case, running time would be O(n2).
129. Bubble Sort
• Bubble sort is a very simple method that sorts the array
repeatedly moving the largest element to the highest index p
array (in case of arranging elements in ascending order).
• In bubble sorting, consecutive adjacent pairs of elements in
compared with each other.
• If the element at lower index is greater than the element a
index, the two elements are interchanged so that the smalle
placed before the bigger one.
• This process is continued till the list of unsorted elements
exhaust.
130. Bubble Sort
• Technique:
• In Pass 1, A[0] and A[1] are compared, then A[1] is compared with A[2], A
with A[3] and so on. Finally, A[N-2] is compared with A[N-1]. Pass
comparisons and places the biggest element at the highest index of the
• In Pass 2, A[0] and A[1] are compared. then A[1] is compared with A[2],
with A[3] and so on. Finally, A[N-3] is compared with A[N-2]. Pass
comparisons and places the second biggest element at the second high
array.
• In Pass 3, A[0] and A[1] are compared. then A[1] is compared with A[2], A
with A[3] and so on. Finally, A[N-4] is compared with A[N-3]. Pass
comparisons and places the third biggest element at the third highest in
• In Pass n-1, A[0] and A[1] are compared so that A[0] < A[1]. After t
elements of the array are arranged in ascending order.
131. Bubble Sort
• Procedure BUBBLE_SORT (A, N) : Given an unordered vector A consisting of N elements, this procedure rearrang
The sorting process is based on the techniques just described. The variable PASS denotes the pass index. The vari
first element that is compared with every consecutive element.
1. [Loop on pass index]
LAST = N
1. Repeat thru step 3 for PASS = 1 to N-1
2 [Initialize exchanges counter for this pass]
1. EXCHS 0
3 [Loop for iterating from start to end]
1. Repeat thru step 4 for J = 1 to LAST– 1
4 [Make pairwise comparisions on unsorted elements]
1. If A[J] > A[J+1]
1. Then A[J] A[J+1]
2. EXCHS EXCHS + 1
5 [Were any exchanges made on this pass?]
1. If EXCHS = 0
1. Then Return
else
LAST = LAST -1
5. [Finished]
1. Return
132. Analysis of Bubble Sort
• In bubble sort, we have seen that there are total N-1 passes. In the
comparisons are made to place the highest element in its correct positio
2, there are N-2 comparisons and the second highest element is placed
Therefore, to compute the complexity of the bubble sort, we need to c
number of comparisons made. For this purpose, the number f(n)
made can be given as,
• f(n) = (n – 1) + (n – 2) + (n – 3) + ….. + 3 + 2 + 1
= n (n – 1)/2 = n2/2 + O(n) = O(n2)
• Therefore, the complexity of a bubble sort algorithm is (average an
this means that to execute, bubble sort require time that is proportiona
is the total number of elements in the array.
• When the list is already sorted (best-case), no swapping is done, but
continue with all n – 1 passes, hence the complexity of bubble sort is o
134. Divide and Conquer
• Recursive in structure
• Divide
• the problem into several smaller sub problems that are similar to the orig
in size
• Conquer
• the sub-problems by solving them recursively. If they are small enough,
in a straightforward manner.
• Combine
• the solutions to create a solution to the original problem
135. Merge Sort
• Merge sort is a sorting algorithm that uses the divide, conquer
algorithmic paradigm. Where,
• Divide means partitioning the n-element array to be sorted i
arrays of n/2 elements in each sub-array. (If A is an array conta
one element, then it is already sorted. However, if there are m
in the array, divide A into two sub-arrays, A1 and A2, each con
half of the elements ofA).
• Conquer means sorting the two sub-arrays recursively using m
• Combine means merging the two sorted sub-arrays of size
produce the sorted array of n elements.
• Merge sort algorithms focuses on two main concepts to
performance (running time):
• A smaller list takes few steps and thus less time to sort than a large list.
• Less steps, thus less time is needed to create a sorted list from two so
than creating it using two unsorted lists.
136. Merge Sort
• Technique:
• If the array is of length 0 or 1, then it is already sorted. Otherwise:
• (Conceptually) divide the unsorted array into two sub- arrays of about ha
• Use merge sort algorithm recursively to sort each sub-array
• Merge the two sub-arrays to form a single sorted list
138. Merge Sort
• Procedure MERGE_SORT (ARR, BEG, END) : Given an vector ARR, it is required to sort recur
positions BEG and END (inclusive). MID denotes the position of the middle element of that sub
1. [Test base condition]
1. If BEG < END
2. [Calculate the midpoint position of current subtable]
1. MID (BEG + END) / 2
3. [Recursively sort the first subtable]
1. Call MERGE_SORT (ARR, BEG, MID)
4. [Recursively sort the second subtable]
1. Call MERGE_SORT (ARR, MID + 1, END)
5. [Merge two ordered subtables]
1. MERGE (ARR, BEG, MID, END)
6. [Finished]
1. Return
139. Merge Sort
• Procedure MERGE (ARR, BEG, MID, END) : Given two ordered subtables stored in a vecto
END as just described, this procedure performs a simple merge. TEMP is a temporary v
process. The variables I and J denote the cursor associated with the first and second subt
is an index variable associated with the vectorTEMP.
1. [Initialize]
1. I BEG
2. J MID + 1
3. INDEX 0
2. [Compare corresponding elements and compute the smallest]
1. Repeat while ( I ≤ MID ) AND ( J ≤ END )
1. If ARR[I] < ARR[J]
1. Then TEMP[INDEX] ARR[I]
2. I I + 1
2. Else
1. TEMP [INDEX] ARR[J]
2. J J + 1
3. INDEX INDEX + 1
3. [Copy the remaining elements of
1. If I > MID
1. Repeat while J ≤ END
1. TEMP[INDEX]
2. INDEX INDEX
3. J J + 1
[Copy the remaining elements o
2. Else
1. Repeat while I ≤ MID
1. TEMP[INDEX]
2. INDEX INDEX
3. I I + 1
4. [Copy elements in temporary vec
1. K 0
2. Repeat while K < INDEX
1. ARR[K] = TEMP[K]
2. K K + 1
5. [Finished]
1. Return
140. Analysis of Merge Sort
Statement Effor
Merge_Sort (ARR, BEG, END) T(n)
If (BEG < END) O(1
MID = (BEG + END) / 2 O(1
Merge_Sort (ARR, BEG, MID) T(n/2
Merge_Sort (ARR, MID + 1, END) T(n/2
Merge (ARR, BEG, MID, END) O(n
SoT(n) = O (1) when n = 1, and
= 2T(n/2) + O(n) + O(1) when n > 1
Solving this recurrence, we getT(n) = O(n log n)
Although algorithm merge sort has an optimal time complexity but a major d
algorithm is that it needs an additional space of O(n) for the temporary array
in-place sorting algorithm).
141. Analysis of Merge Sort
• Assume n=2k for k>=1 k separate passes are required to merge 2k separa
single table.
• T(n) = 2T(n/2) + bn + c
• T(n/2) = 2T((n/2) /2) + b(n/2) + c = 2[2T(n/4) + b(n/2) + c] + bn +c
• = 4T(n/4)+ bn +2c +bn +c = 4T(n/4) + 2bn+ (1 + 2) c
• = 22T(n/22)+2bn+(20+21) = 4 [2T((n/4)/2) + b(n/4) + c] +2bn + (1
• =8T(n/8) + 3bn+ (1+2+4)c = 23T(n/23) + 3bn+ (20+21+22)c
• Generalizing, we can write
• 2kT(n/2k)+kbn+(20+21+...+2k-1)c
• T(1) = a, since n=2k log n = log2k = k
• T(n) = 2k. a + k bn + (20+21+...+2k-1) c
• = b. n log n+ (a + c) n – c
• = O (n log n) [Best, Average andWorst]
142. Quick Sort
• The quick sort algorithm works as follows:
• Select an element pivot from the array elements
• Re-arrange the elements in the array in such a way that all elements th
the pivot appear before the pivot and all elements greater than the pivo
after it (equal values can go either way).
• After such a partitioning, the pivot is placed in its final position. Th
partition operation.
• Recursively sort the two sub-arrays thus obtained. (One with sub-list of l
that of the pivot element and the other having higher value elements).
143. Quick Sort
• Procedure QUICK_SORT (K, LB, UB) : Given a table K of N records, this recursive procedure
order. A dummy record with K[N + 1] assumed where K[I] ≤ K[N + 1] for all 1 ≤ I ≤ N. The inte
denote the lower and upper bounds of the certain subtable being processed. The indices I and
keys during the processing of each subtable. KEY contains the key value which is being places
sorted subtable. FLAG is a logical variable which indicates the end of the process that places a
When FLAG becomes false, the input subtable has been partitioned into two disjointed parts.
1. [Initialize]
1. FLAG TRUE
2. [Perform sort]
1. If LB < UB
2. Then I LB
1. J UB + 1
2. KEY K[LB]
3. Repeat while FLAG
1. I I + 1
2. Repeat while K[I] < KEY && I <= UB
1. I I + 1
3. J J - 1
4. Repeat while K[J] > K
1. J J – 1
5. If I < J
1. Then K[I] K
2. Else FLAG Fals
6. K[LB] K[J]
7. Call Quick_Sort(K, LB
8. Call Quick_Sort(K, J +
3. [Finished]
1. Return
146. Analysis of Quick Sort
• The advantage of quick sort is that we can sort “in-place” i.e. withou
a temporary buffer depending on the size of the inputs.
• Basic quicksort relation is: T(n) = cn + T(i) + T(n – i – 1) where i is the
sub-block after partitioning.
• Worst-case:
• This happens when the pivot is the smallest (or the largest) element.
• Then one of the partitions is empty, and we repeat recursively the pr
elements.
• Best-case :
• The best case is when the pivot is the median of the array, and then the left a
will have same size.
• There are log N partitions, and to obtain each partitions we do N compariso
than N/2 swaps).
• Average-case
148. Best Case of Quick Sort
• The pivot is in the middle
• T(N) = 2T(N/2) + cN
• Divide by N:
• T(N) / N =T(N/2) / (N/2) + c
• T(N/2) / (N/2) =T(N/4) / (N/4) + c
• T(N/4) / (N/4) =T(N/8) / (N/8) + c
• T(2) / 2 =T(1) / (1) + c
• Add all equations:
• T(N) / N + T(N/2) / (N/2) + T(N/4) / (N/4) + …. + T(2) / 2 = (N/2) / (N/2) + T(N
T(1) / (1) + kc
• Notice that this recurrence will continue only until n = 2k (otherwise we ha
until k = log n .Thus, by putting k = log n
• T(N)/N =T(1) + cLogN = 1 + cLogN
• T(N) = N + NcLogN
• ThereforeT(N) = O(NlogN)
149. Trick is to select best pivot
• Different ways of choosing a pivot:
• First element
• Last element
• Median-of-three elements
• Pick three elements, and find the media x of these elements, use tha
pivot.
• Random element
• Randomly pick a element as a pivot
150. Radix Sort
• Radix sort is one of the linear sorting algorithms for integers.
• Radix sort is a non-comparative integer sorting algorithm th
with integer keys by grouping keys by the individual digits wh
same significant position and value.
• It functions by sorting the input numbers on each digit, for
digits in the numbers.
• However, the process adopted by this sort method i
counterintuitive, in the sense that the numbers are sorted
significant digit first, followed by the second-least significant
on till the most significant digit.
151. Radix Sort
• Each key is first figuratively dropped into one level
corresponding to the value of the rightmost digit.
• Each bucket preserves the original order of the keys as
dropped into the bucket.
• There is a one-to-one correspondence between the buckets a
that can be represented by the rightmost digit. Then, the pro
with the next neighbouring more significant digit until there
digits to process. In other words:
• Take the least significant digit (or group of bits, both being examples of
key.
• Group the keys based on that digit, but otherwise keep the original order
what makes the LSD radix sort a stable sort.)
• Repeat the grouping process with each more significant digit.
154. Introduction
• In the real world, many problems are represented
in terms of objects and connections between
them.
– For example, in an airline route map, we might be
interested in questions like: “What’s the fastest way to
go from Hyderabad to New York?” or “What is the
cheapest way to go from Hyderabad to New York?” To
answer these questions we need information about
connections (airline routes) between objects (towns).
Graphs are data structures used for solving these
kinds`of problems.
155. Glossary
• Graph: A graph is a pair (V, E), where V is a set
of nodes, called vertices, and £ is a collection
of pairs of vertices, called edges.
– Vertices and edges are positions and store
elements.
156. – Definitions that we use:
– ○ Directed edge:
o ordered pair of vertices (u, v)
o first vertex u is the origin
o second vertex v is the destination
o Example: one-way road traffic
161. • When an edge connects two vertices, the
vertices are said to be adjacent to each other
and the edge is incident on both vertices.
• A graph with no cycles is called a tree. A tree is
an acyclic connected graph.
162. • A self loop is an edge that connects a vertex to
itself.
163. • Two edges are parallel if they connect the
same pair of vertices.
164. • The Degree of a vertex is the number of edges
incident on it.
• • A subgraph is a subset of a graph’s edges
(with associated vertices) that form a
• graph.
• • A path in a graph is a sequence of adjacent
vertices. Simple path is a path with no
• repeated vertices. In the graph below, the
dotted lines represent a path from G to E.
165. • A cycle is a path where the first and last
vertices are the same. A simple cycle is a
• cycle with no repeated vertices or edges
(except the first and last vertices).
166. • • We say that one vertex is connected to
another if there is a path that contains both of
• them.
• • A graph is connected if there is a path from
every vertex to every other vertex.
• • If a graph is not connected then it consists of
a set of connected components.
167. • A directed acyclic graph [DAG] is a directed
graph with no cycles.
168. • A forest is a disjoint set of trees.
• • A spanning tree of a connected graph is a
subgraph that contains all of that graph’s
vertices and is a single tree. A spanning forest
of a graph is the union of spanning trees of its
connected components.
• • A bipartite graph is a graph whose vertices
can be divided into two sets such that all
edges connect a vertex in one set with a
vertex in the other set.
169. • In weighted graphs integers (weights) are
assigned to each edge to represent (distances
or costs).
170. • Graphs with all edges present are called
complete graphs.
171. • Graphs with relatively few edges (generally if it
edges < |V| log |V|) are called sparse graphs.
– • Graphs with relatively few of the possible edges
missing are called dense.
– • Directed weighted graphs are sometimes called
network.
– • We will denote the number of vertices in a given
graph by |V|, and the number of edges by |E|. Note
that E can range anywhere from 0 to |V|(|V| – l)/2 (in
undirected graph). This is because each node can
connect to every other node.
172. Applications of Graphs
• Representing relationships between
components in electronic circuits
– Transportation networks: Highway network, Flight
network
– Computer networks: Local area network, Internet,
Web
– Databases: For representing ER (Entity
Relationship) diagrams in databases, for
representing dependency of tables in databases
173. Graph Representation
• to manipulate graphs we need to represent
them in some useful form.
• Basically, there are three ways of doing this:
– Adjacency Matrix
– Adjacency List
– Adjacency Set
174. • Adjacency Matrix
• Graph Declaration for Adjacency Matrix
• First, let us look at the components of the
graph data structure. To represent graphs, we
need the
• number of vertices, the number of edges and
also their interconnections. So, the graph can
be
• declared as:
175.
176. • Description
• In this method, we use a matrix with size V × V. The
values of matrix are boolean. Let us assume the matrix
is Adj. The value Adj[u, v] is set to 1 if there is an edge
from vertex u to vertex v and 0 otherwise.
• In the matrix, each edge is represented by two bits for
undirected graphs. That means, an edge from u to v is
represented by 1 value in both Adj[u,v ] and Adj[u,v].
To save time, we can process only half of this
symmetric matrix. Also, we can assume that there is an
“edge” from each vertex to itself. So, Adj[u, u] is set to
1 for all vertices.
177. • If the graph is a directed graph then we need
to mark only one entry in the adjacency
matrix. As an example, consider the directed
graph below.
179. • Now, let us concentrate on the
implementation. To read a graph, one way is
to first read the vertex names and then read
pairs of vertex names (edges). The code below
reads an undirected graph.
180.
181. • The adjacency matrix representation is good if
the graphs are dense. The matrix requires
O(V2) bits of storage and O(V2) time for
initialization. If the number of edges is
proportional to V2, then there is no problem
because V2 steps are required to read the
edges. If the graph is sparse, the initialization
of the matrix dominates the running time of
the algorithm as it takes takes O(V2).
182. • Adjacency List
• Graph Declaration for Adjacency List
• In this representation all the vertices connected to a
vertex v are listed on an adjacency list for that vertex v.
This can be easily implemented with linked lists. That
means, for each vertex v we use a linked list and list
nodes represents the connections between v and other
vertices to which v has an edge.
• The total number of linked lists is equal to the number
of vertices in the graph. The graph ADT can be declared
as:
183.
184. • Description
• Considering the same example as that of the
adjacency matrix, the adjacency list
representation can be given as:
185. • Since vertex A has an edge for B and D, we
have added them in the adjacency list for A.
The same is the case with other vertices as
well.
186.
187.
188.
189. • For this representation, the order of edges in
the input is important. This is because they
determine the order of the vertices on the
adjacency lists.
• The same graph can be represented in many
different ways in an adjacency list. The order
in which edges appear on the adjacency list
affects the order in which edges are processed
by algorithms.
190. • Disadvantages of Adjacency Lists
• Using adjacency list representation we cannot perform
some operations efficiently. As an example, consider the
case of deleting a node. . In adjacency list representation, it
is not enough if we simply delete a node from the list
representation, if we delete a node from the adjacency list
then that is enough.
• For each node on the adjacency list of that node specifies
another vertex. We need to search other nodes linked list
also for deleting it. This problem can be solved by linking
the two list nodes that correspond to a particular edge and
making the adjacency lists doubly linked. But all these extra
links are risky to process.
191. • Adjacency Set
• It is very much similar to adjacency list but
instead of using Linked lists, Disjoint Sets
[Union- Find] are used.
192. • Comparison of Graph Representations:
• Directed and undirected graphs are represented
with the same structures.
• For directed graphs,everything is the same,
except that each edge is represented just once.
An edge from x to y is represented by a 1 value in
Agj[x][y] in the adjacency matrix, or by adding y
on x’s adjacency list.
• For weighted graphs, everything is the same,
except fill the adjacency matrix with weights
instead of boolean values.
193.
194. Graph Traversals
• To solve problems on graphs, we need a mechanism for
traversing the graphs. Graph traversal algorithms are
also called graph search algorithms. Like trees traversal
algorithms (Inorder, Preorder, Postorder and Level-
Order traversals), graph search algorithms can be
thought of as starting at some source vertex in a graph
and “searching” the graph by going through the edges
and marking the vertices. Now, we will discuss two
such algorithms for traversing the graphs.
– Depth First Search [DFS]
– Breadth First Search [BFS]
195. • Depth First Search [DFS]:
• DFS algorithm works in a manner similar to
preorder traversal of the trees. Like preorder
traversal, internally this algorithm also uses
stack.
196. • Let us consider the following example.
Suppose a person is trapped inside a maze. To
come out from that maze, the person visits
each path and each intersection (in the worst
case). Let us say the person uses two colors of
paint to mark the intersections already
passed. When discovering a new intersection,
it is marked grey, and he continues to go
deeper.
197. • After reaching a “dead end” the person knows
that there is no more unexplored path from
the grey intersection, which now is completed,
and he marks it with black. This “dead end” is
either an intersection which has already been
marked grey or black, or simply a path that
does not lead to an intersection
198. • The intersections of the maze are the vertices
and the paths between the intersections are
the edges of the graph. The process of
returning from the “dead end” is called
backtracking. We are trying to go away from
the starting vertex into the graph as deep as
possible, until we have to backtrack to the
preceding grey vertex. In DFS algorithm, we
encounter the following types of edges.
199.
200. • For most algorithms boolean classification,
unvisited/visited is enough (for three color
implementation refer to problems section).
That means, for some problems we need to
use three colors, but for our discussion two
colors are enough.
201. • Initially all vertices are marked unvisited (false).
The DFS algorithm starts at a vertex u in the
graph. By starting at vertex u it considers the
edges from u to other vertices.
• If the edge leads to an already visited vertex, then
backtrack to current vertex u. If an edge leads to
an unvisited vertex, then go to that vertex and
start processing from that vertex.
• That means the new vertex becomes the current
vertex. Follow this process until we reach the
dead-end. At this point start backtracking.
202. • The process terminates when backtracking
leads back to the start vertex. The algorithm
based on this mechanism is given below:
assume Visited[] is a global array.
203.
204.
205. • As an example, consider the following graph. We can
see that sometimes an edge leads to an already
discovered vertex. These edges are called back edges,
and the other edges are called tree edges because
deleting the back edges from the graph generates a
tree.
• The final generated tree is called the DFS tree and the
order in which the vertices are processed is called DFS
numbers of the vertices. In the graph below, the gray
color indicates that the vertex is visited (there is no
other significance). We need to see when the Visited
table is updated.
206.
207.
208.
209.
210.
211.
212.
213.
214.
215.
216.
217.
218.
219.
220. • From the above diagrams, it can be seen that the DFS
traversal creates a tree (without back edges) and we
call such tree a DFS tree. The above algorithm works
even if the given graph has connected components.
• The time complexity of DFS is O(V + E), if we use
adjacency lists for representing the graphs. This is
because we are starting at a vertex and processing the
adjacent nodes only if they are not visited. Similarly, if
an adjacency matrix is used for a graph representation,
then all edges adjacent to a vertex can’t be found
efficiently, and this gives O(V2) complexity.
221. • Applications of DFS
– Topological sorting
– Finding connected components
– Finding articulation points (cut vertices) of the
graph
– Finding strongly connected components
– Solving puzzles such as mazes
222. • Breadth First Search [BFS]:
• The BFS algorithm works similar to level – order
traversal of the trees. Like level – order traversal, BFS
also uses queues. In fact, level – order traversal got
inspired from BFS. BFS works level by level. Initially, BFS
starts at a given vertex, which is at level 0. In the first
stage it visits all vertices at level 1 (that means, vertices
whose distance is 1 from the start vertex of the graph).
In the second stage, it visits all vertices at the second
level. These new vertices are the ones which are
adjacent to level 1 vertices.
223. • BFS continues this process until all the levels of
the graph are completed. Generally queue data
structure is used for storing the vertices of a
level. As similar to DFS, assume that initially all
vertices are marked unvisited (false). Vertices that
have been processed and removed from the
queue are marked visited (true). We use a queue
to represent the visited set as it will keep the
vertices in the order of when they were first
visited. The implementation for the above
discussion can be given as:
224. • Breadth First Search [BFS]:
• As similar to DFS, assume that initially all
vertices are marked unvisited (false). Vertices
that have been processed and removed from
the queue are marked visited (true).
• We use a queue to represent the visited set as
it will keep the vertices in the order of when
they were first visited. The implementation for
the above discussion can be given as:
225.
226. • As an example, let us consider the same graph
as that of the DFS example. The BFS traversal
can be shown as:
227.
228.
229.
230.
231.
232.
233. • Time complexity of BFS is O(V + E), if we use
adjacency lists for representing the graphs,
and O(V2) for adjacency matrix
representation.
234. • Applications of BFS
– Finding all connected components in a graph
– Finding all nodes within one connected
component
– Finding the shortest path between two nodes
– Testing a graph for bipartiteness
235. • Comparing DFS and BFS
• Comparing BFS and DFS, the big advantage of DFS is
that it has much lower memory requirements than BFS
because it’s not required to store all of the child
pointers at each level. Depending on the data and what
we are looking for, either DFS or BFS can be
advantageous. For example, in a family tree if we are
looking for someone who’s still alive and if we assume
that person would be at the bottom of the tree, then
DFS is a better choice. BFS would take a very long time
to reach that last level.
236. • The DFS algorithm finds the goal faster. Now, if
we were looking for a family member who died a
very long time ago, then that person would be
closer to the top of the tree. In this case, BFS
finds faster than DFS. So, the advantages of either
vary depending on the data and what we are
looking for.
• DFS is related to preorder traversal of a tree. Like
preorder traversal, DFS visits each node before its
children. The BFS algorithm works similar to level
– order traversal of the trees.
237. • If someone asks whether DFS is better or BFS is
better, the answer depends on the type of the
problem that we are trying to solve.
• BFS visits each level one at a time, and if we know
the solution we are searching for is at a low
depth, then BFS is good. DFS is a better choice if
the solution is at maximum depth.
• The below table shows the differences between
DFS and BFS in terms of their applications.
238.
239. Minimal Spanning Tree
• The Spanning tree of a graph is a subgraph
that contains all the vertices and is also a tree.
A graph may have many spanning trees. As an
example, consider a graph with 4 vertices as
shown below. Let us assume that the corners
of the graph are vertices.
240. • For this simple graph, we can have multiple spanning trees as
shown below.
• The algorithm we will discuss now is minimum spanning tree in an
undirected graph. We assume that the given graphs are weighted
graphs. If the graphs are unweighted graphs then we can still use
the weighted graph algorithms by treating all weights as equal. A
minimum spanning tree of an undirected graph G is a tree formed
from graph edges that connect all the vertices of G with minimum
total cost (weights). A minimum spanning tree exists only if the
graph is connected.
• There are two famous algorithms for this problem:
– Prim’s Algorithm
– Kruskal’s Algorithm
241. • Prim’s Algorithm:
• Prim’s algorithm is almost the same as
Dijkstra’s algorithm. As in Dijkstra’s algorithm,
in Prim’s algorithm we keep the values
distance and paths in the distance table. The
only exception is that since the definition of
distance is different, the updating statement
also changes a little. The update statement is
simpler than before.
242.
243. • The entire implementation of this algorithm is
identical to that of Dijkstra’s algorithm. The
running time is O(|V|2) without heaps [good
for dense graphs], and O (ElogV) using binary
heaps [good for sparse graphs].
244. • Kruskal’s Algorithm
• The algorithm starts with V different trees (V is the vertices
in the graph). While constructing the minimum spanning
tree, every time Kruskal’s alorithm selects an edge that has
minimum weight and then adds that edge if it doesn’t
create a cycle. So, initially, there are | V | single-node trees
in the forest. Adding an edge merges two trees into one.
When the algorithm is completed, there will be only one
tree, and that is the minimum spanning tree. There are two
ways of implementing Kruskal’s algorithm:
– By using Disjoint Sets: Using UNION and FIND operations
– By using Priority Queues: Maintains weights in priority queue
245. • The appropriate data structure is the UNION/FIND
belong to the same set if and only
algorithm [for implementing forests]. Two vertices
if they are
connected in the current spanning forest.
• Each vertex is initially in its own set. If u and v are in
the same set, the edge is rejected because it forms a
cycle. Otherwise, the edge is accepted, and a UNION is
performed on the two sets containing u and v.
• As an example, consider the following graph (the edges
show the weights).
246.
247. • Now let us perform Kruskal’s algorithm on this
graph. We always select the edge which has
minimum weight.
248. • From the above graph, the edges which have
minimum weight (cost) are: AD and BE. From
these two we can select one of them and let
us assume that we select AD (dotted line).
249.
250.
251.
252.
253.
254. • The next low cost edges are CB and EF. But if we
select CB, then it forms a cycle. So we discard it.
• This is also the case with EF. So we should not
select those two. And the next low cost is 9 (BD
and EG).
• Selecting BD forms a cycle so we discard it.
Adding EG will not form a cycle and therefore
with this edge we complete all vertices of the
graph.
255.
256. • Note:The worst-case running time of this
algorithm is O(ElogE), which is dominated by
the heap operations.
• That means, since we are constructing the
heap with E edges, we need O(ElogE) time to
do that.