1
ANNA UNIVERSITY TIRUCHIRAPPALLI
Regulations 2008 Syllabus
B. Tech IT/ B.E EEESEMESTER III
CS1201 - DATA STRUCTURES
Prepa...
2
TEXT BOOKS
1. Cormen T. H.., Leiserson C. E, and Rivest R.L., ―Introduction to Algorithms‖, Prentice Hall of
India, New ...
3
4
UNIT I - FUNDAMENTALS OF ALGORITHMS
Algorithm – Analysis of Algorithm – Best Case and Worst Case Complexities –Analysis ...
5
UNIT – I FUNDAMENTALS OF ALGORITHMS
I. ALGORITHM
 An algorithm is any well-defined computational procedure that takes s...
6
Insert A[j] into the sorted sequence A[1….j-1]
i j-1
while i>0 and A[i]>key
do A[I+1] A[i]
ii-1
A[i+1]key
 The inse...
7
Goals for an algorithm
Basic goals for an algorithm
1. Always correct
2. Always terminates
3. Performance- Performance o...
8
4. Analyzing algorithms: It refers to the process of determining how much computing time
and storage an algorithm will r...
9
The implementation of these constructs relies on the implementation language like C++
language.
Sequence is a series of ...
10
8 A[i + 1] +- key C8 n - 1
The running time of the algorithm is the sum of running times for each statement
executed; a...
11
Worst case and Average case Analysis
In the analysis of Insertion sort, the best case in which the input array was alre...
12
Divide the problem into a number of sub problems.
Conquer the sub problems by solving them recursively. If the sub prob...
13
When an algorithm contains a recursive call to itself, its running time can often be
described by a recurrence equation...
14
Average-case running time
– May be difficult to define what ―average‖ means, but gives the necessary
details about an a...
15
ANALYSIS OF ALGORITHM USING DATA STRUCTURES:
The analysis of algorithm is made considering both qualitative and quantit...
16
The space needed to store the compiled version of the program instructions
Data space
The space needed to store all con...
17
s+ = a[i];
}
return s;
}
Space one word for n, one for a [passed by reference!], one for i  constant space!
When memor...
18
V. AMORTIZED ANALYSIS
In an amortized analysis, the time required to perform a sequence of data structure
operations is...
19
VI. ASYMPTOTIC NOTATION
Complexity analysis: rate at which storage or time grows as a function of the problem size
Asym...
20
Eg.
n E O(n2
) n3
^E O(n2
)
100n+5 E O(n2
) n4
+n+1^E O(n2
)
100n+5 E O(n)
1/2n(n-1) E O(n2
)
ii) Ω(g(n)) stands for se...
21
f(N) grows no faster than g(N) for ―large‖ N
Big O Rules
• If is f(n) a polynomial of degree d, then f(n) is
• O(d), i....
22
• Let f(N) = 2N2
. Then
– f(N) = (N)
– f(N) = (N2
) (best answer)
Big-Theta
• f(N) = (g(N)) iff
f(N) = O(g(N)) and f(N)...
23
T(n) = Ω(F(n)) Growth of T(n) >= growth of F(n)
T(n) = Θ(F(n)) Growth of T(n) = growth of F(n)
T(n) = o(F(n)) Growth of...
24
UNIT II - FUNDAMENTALS OF DATA STRUCTURES
Arrays – Structures – Stacks – Definition and examples – Representing Stacks ...
25
Unit: II - FUNDAMENTALS OF DATA STRUCTURES
I. ARRAYS
Array is a finite ordered set of homogeneous elements. Array size ...
26
for(i = 0; i < NUMELTS; a[i++] = 0);
Now only a single change in the constant definition is needed to change the upper ...
27
Two-Dimensional array is an array of another array. For example
int a[3][5];
This represents a array containing three e...
28
A structure is a group of items in which each item is identified by its own identifier. In
programming language, a stru...
29
Structure variable sname contains three members and ename contains a separate three
members. Each member of a structure...
30
return 0;
}
STACK AND QUEUE
Stacks and queues are used to represent sequence of elements which can be modified by
inser...
31
(Stack model: only the top element is accessible)
REPRESENTATION OF STACK
A) Implementation of Stack using array
A stac...
32
};
typedef node_ptr STACK;
Routine to test whether a stack is empty-linked list implementation is given below,
int is_e...
33
tmp_cell->next = S->next;
S->next = tmp_cell;
}
}
The top is performed by examining the element in the first position o...
34
A) Balancing Symbols
Every brace, bracket, and parenthesis must correspond to their left counterparts. The
sequence [()...
35
is evaluated as follows: The first four symbols are placed on the stack. The resulting stack is
Next a '+' is read, so ...
36
Finally, a '*' is seen and 48 and 6 are popped, the result 6 * 48 = 288 is pushed.
The time to evaluate a postfix expre...
37
pushed onto the stack. Next b is read and passed through to the output. The state of affairs at this
juncture is as fol...
38
Now we read a ')', so the stack is emptied back to the '('. We output a '+'.
We read a '*' next; it is pushed onto the ...
39
For each queue data structure, keep an array, QUEUE[], and the positions q_front and
q_rear, which represent the ends o...
40
There are two warnings about the circular array implementation of queues. First, it is
important to check the queue for...
41
unsigned int q_max_size; /* Maximum # of elements */
/* until Q is full */
unsigned int q_front;
unsigned int q_rear;
u...
42
When jobs are submitted to a printer, they are arranged in order of arrival. Thus,
essentially, jobs sent to a line pri...
43
Demerits of List using array
However, insertion and deletion are expensive. For example, inserting at position 0
(which...
44
To execute print_list(L) or find(L,key), we merely pass a pointer to the first element in the
list and then traverse th...
45
typedef node_ptr LIST;
typedef node_ptr position;
Function to test whether a linked list is empty
int is_empty( LIST L ...
46
{
position p, tmp_cell;
p = find_previous( x, L );
if( p->next != NULL ) /* Implicit assumption of header use */
{ /* x...
47
position p, tmp;
p = L->next; /* header assumed */
L->next = NULL;
while( p != NULL )
{
tmp = p->next;
free( p );
p = t...
48
9. Swap two adjacent elements by adjusting only the pointers (and not the data) using
singly
linked list.
10. Define a ...
49
UNIT III – TREES
Binary Trees – Operations on Binary Tree Representations – Node Representation –Internal and
External ...
50
Unit: III TREES
TREES
A tree is a finite set of one or more nodes such that there is a specially designated node
called...
51
Degree
The number of sub trees of a node is called its degree.
Degree of A is 4
Degree of C is 2
Degree of D is 1
Degre...
52
Fig: A sample binary tree with 11 nodes
Two possible situations of a binary tree are (a) Full binary tree (b) Complete ...
53
A. Linear Representation of a Binary tree
In this representation, the nodes are stored level by level, starting from th...
54
RC LC
RC-Right Child LC=Left Child
Here LC & RC are two Link fields to store the address of left child and right child ...
55
return(right(father(p)));
return(left(father(p));
In constructing a binary tree, the operations maketree, setleft and s...
56
return NULL;
if (x<T-> Element)
return Find(X,T->Left);
else
if (x>T->Element)
return Find( X,T-> right);
else
return T...
57
template <class Etype>
void
Binary _Search_Tree<EType>::
Find_Max (Tree_Node <Etype>* T) const
{
if (T!=NULL)
while (T-...
58
Alternatively, the sign of the father field could be negative if the node is a left son or
positive if it is a right so...
59
Both the linked array representation and the dynamic node representation are
implementations of an abstract linked repr...
60
nodes do not contain a left or right fields and are kept as a single info array that is allocated
sequentially as neede...
61
Fig (b) Almost complete extensions
0 1 2 3 4 5 6 7 8 9 10 11 12
A B C D E F G
0 1 2 3 4 5 6 7 8 9
H I J K L M
Fig(c) Ar...
62
while(scanf(―%d‖,&number)!=EOF)
{
p=q=0;
while (q<NIMNODES && node[q].used && number!= node[p].info)
{
p=q;
if (number<...
63
if (q>=NUMNODES)
error (―array overflow‖);
else if (node[q].used)
error (―invalid insertion‖);
else {
node[q].info=x;
n...
64
Fig: Inorder A B C D E G H I J K
Recursive routine for Inorder Traversal
Void Inorder (Tree T)
{
if ( T!= NULL)
{
Inord...
65
Fig: Preorder 20, 10, 30
Fig: Preorder D C A B I G E H K J
Recursive routine for Inorder Traversal
Void Preorder (Tree ...
66
Example
Fig: Postorder 10, 30, 20
Fig: Postorder B A C Education H G J K I D
Recursive routine for Inorder Traversal
Vo...
67
The algorithm also constructs an array position of size at least n such that position[il
points to the node representin...
68
Fig: Huffman trees
The huffman tree is strictly binary. Thus, if there are n symbols in the alphabet, the Huffman tree
...
69
 Associated with each leaf node are the contents of the corresponding list
edlement. Associated with each nonleaf node...
70
p = left(p);
else {
r -= lcount(p);
p = right(p);
}
find = p;
 Fig(a) illustrates finding the fifth element of a list ...
71
Deleting an Element
It involves only resetting a left or right pointer in the father of the deleted leaf dl to
null.Fig...
72
The efficiency of the search process can be improved by using a sentinel, as in sequential
searching.
A sentinel node, ...
73
Fig(b) cont..
Inserting into a Binary search Tree
The following algorithm searches a binary search tree and inserts a n...
74
else
right( q) = v;
return ( v) ;
Note that after a new record is inserted, the tree retains the property of being sort...
75
Fig(c) Deleting node with key 11
P=tree;
Q=null;
while (p!=null && k(p)!=key)
{
q=p;
p=(key < k(p))? Left(p):right(p);
...
76
if (f!=p)
{
left (p)=right(p);
right (rp)=right(p);
}
left(rp)=left(p);
}
if (q==null)
tree=rp;
else
(p==left(q))? ,eft...
77
After doing this insertion the records occupying A[1]..A[i] are in sorted order.
Procedure
Void Insertion_Sort (int a[]...
78
Average Case Analysis O(N2
)
B) SHELL SORT
Shell Sort was invented by Donald Shell. It improves upon bubble sort and in...
79
81 94 11 96 12 35 17 95 28 58
After first pass
35 17 11 28 12 81 94 95 96 58
In second pass, K is reduced to 3
After se...
80
Quick sort works by partitioning a given array A[p . . r] into two non-empty sub array A[p . .
q] and A[q+1 . . r] such...
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Data Structures (BE)
Upcoming SlideShare
Loading in …5
×

Data Structures (BE)

1,194 views

Published on

Data Structures Based On Anna University Syllabus. BE.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,194
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
40
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Data Structures (BE)

  1. 1. 1 ANNA UNIVERSITY TIRUCHIRAPPALLI Regulations 2008 Syllabus B. Tech IT/ B.E EEESEMESTER III CS1201 - DATA STRUCTURES Prepared By: B.Sundara vadivazhagan HOD i/c / IT S.Karthik Lect/ IT G.Mahalakshmi Lect / IT UNIT I - FUNDAMENTALS OF ALGORITHMS Algorithm – Analysis of Algorithm – Best Case and Worst Case Complexities –Analysis of Algorithm using Data Structures – Performance Analysis – Time Complexity – Space Complexity – Amortized Time Complexity – Asymptotic Notation UNIT II - FUNDAMENTALS OF DATA STRUCTURES Arrays – Structures – Stacks – Definition and examples – Representing Stacks –Queues and Lists – Queue and its Representation – Applications of Stack – Queue and Linked Lists. UNIT III - TREES Binary Trees – Operations on Binary Tree Representations – Node Representation –Internal and External Nodes – Implicit Array Representation – Binary Tree Traversal – Huffman Algorithm – Representing Lists as Binary Trees – Sorting and Searching Techniques – Tree Searching – Hashing UNIT - IV GRAPHS AND THEIR APPLICATIONS Graphs – An Application of Graphs – Representation – Transitive Closure –Warshall‘s Algorithm – Shortest path Algorithm – A Flow Problem – Dijikstra‘s Algorithm – Minimum Spanning Trees – Kruskal and Prim‘s Algorithm – An Application of Scheduling – Linked Representation of Graphs – Graph Traversals UNIT V - STORAGE MANAGEMENT General Lists – Operations – Linked List Representation – Using Lists – Freeing List Nodes – Automatic List Management : Reference Count Method – Garbage Collection – Collection and Compaction
  2. 2. 2 TEXT BOOKS 1. Cormen T. H.., Leiserson C. E, and Rivest R.L., ―Introduction to Algorithms‖, Prentice Hall of India, New Delhi, 2007. 2. M.A.Weiss, ―Data Structures and Algorithm Analysis in C‖, Second Edition, Pearson Education, 2005. REFERENCES 1. Ellis Horowitz, Sartaj Sahni and Sanguthevar Rajasekaran, ―Computer Algorthims/C++‖, Universities Press (India) Private Limited, Second Edition, 2007. 2. A. V. Aho, J. E. Hopcroft, and J. D. Ullman, ―Data Structures and Algorithms‖,First Edition, Pearson Education, 2003. 3. R. F. Gilberg and B. A. Forouzan, ―Data Structures‖, Second Edition, Thomson India Edition, 2005. 4. Robert L Kruse, Bruce P Leung and Clovin L Tondo, ―Data Structures and Program Design in C‖, Pearson Education, 2004. 5. Tanaenbaum A. S. Langram, Y. Augestein M.J, ―Data Structures using C‖, Pearson Education, 2004.
  3. 3. 3
  4. 4. 4 UNIT I - FUNDAMENTALS OF ALGORITHMS Algorithm – Analysis of Algorithm – Best Case and Worst Case Complexities –Analysis of Algorithm using Data Structures – Performance Analysis – Time Complexity – Space Complexity – Amortized Time Complexity – Asymptotic Notation
  5. 5. 5 UNIT – I FUNDAMENTALS OF ALGORITHMS I. ALGORITHM  An algorithm is any well-defined computational procedure that takes some value, or set of values, as input and produces some values, as Output. In other words, an algorithm is a sequence of computational steps that transform the input into the output.  An algorithm can be viewed as a tool for solving a well-specified computational problem. The statement of the problem specifies the desired input/ output relationship. The algorithm describes a specific computational procedure for achieving that input/output relationship. Study of Algorithm : Problem of sorting a sequence of numbers into a non-decreasing order. The Sorting problem is defined as Input: A sequence of n numbers (a1,a2….an) Output: A permutation (reordering) (a1‘,a2‘,…an‘) of the input sequence such that a1‘≤a2‘≤…..≤an‘  Given an input sequence such as (31,41,59,26,41,58), a sorting algorithm returns as output the sequence (26,31,41,41,58,59). Such an input sequence is called an instance of the sorting problem. In general, an instance of a problem consists of all the inputs needed to compute a solution to the problem.  An algorithm is said to be correct if, for every instance, it halts with the correct output. The correct algorithm solves the given computational problem.  An incorrect algorithm might not halt at all on some input instances, or it might halt with other than the desired answer. Example Insertion Sort  Insertion sort is an efficient algorithm for sorting a small number of elements.  Insertion sort works the way many people sort a bridge or gin rummy hand.  We start with an empty left hand and the cards face down on the table. We then remove one card at a time from the table and insert it into the correct position in the left hand.  To find the correct position for a card, we compare with each of the cards already in the hand, from right to left. Pseudocode for INSERTION-SORT INSERTION-SORT (A) for j 2 to length[A] do keyA[j]
  6. 6. 6 Insert A[j] into the sorted sequence A[1….j-1] i j-1 while i>0 and A[i]>key do A[I+1] A[i] ii-1 A[i+1]key  The insertion sort is presented as a procedure called INSERTION SORT, which takes as parameter an array A[1….n] containing a sequence of length n that is to be sorted.  The input numbers are sorted in place: the numbers are rearranged within the array A, with at most a constant number of them stored outside the array at any time.  The input array A contains the sorted output sequence when INSERTION-SORT is finished. 5 4 6 1 3 2 5 6 1 3 2 4 5 1 3 2 4 5 6 3 1 2 4 5 6 1 2 3 4 5 6 Fig: The operation of INSERTION-SORT on the array A=(5, 2, 4,6, 1 ,3). The position of index j is indicated by a circle  The fig shows how this algorithm works for A (5, 2, 4, 6, 1, 3). The index j indicates the current card being inserted into the hand. Array elements A [1..j-1] constitute the currently sorted hand, and elements A[j+1…..n] corresponds to the pile of cards still on the table.  The index j moves left to right through the array. At each iteration of the ―outer‖ for loop, the element A[j] is picked out of the array.  Then starting in position j-1 elements are successively moved one position to the right until the proper position for A[j] is found, at which point it is inserted. 2 4 6 1 3
  7. 7. 7 Goals for an algorithm Basic goals for an algorithm 1. Always correct 2. Always terminates 3. Performance- Performance often draws the line between what is possible and what is impossible. The notion of ―algorithm‖ Description of a procedure which is 1. Finite (i.e., consists of a finite sequence of Characters) 2. Complete (i.e., describes all computation steps) 3. Unique (i.e., there are no ambiguities) 4. Effective (i.e., each step has a defined effect and Can be executed in finite time) Properties: Desired properties of algorithms  Correctness o For each input, the algorithm calculates the requested value  Termination For each input, the algorithm performs only a finite number of steps  Efficiency o Runtime : The algorithm runs as fast as possible o Storage space: The algorithm requires as little storage space as possible. Algorithms-Distinct areas: Five distinct areas to study of algorithms. 1. Creating or devising algorithms: Various design techniques are created to yield good algorithms. 2. Expressing the algorithms in a structured representation. 3. Validating algorithms: The algorithms devised should compute the correct answer for all possible legal inputs. This process is known as a algorithm validation.
  8. 8. 8 4. Analyzing algorithms: It refers to the process of determining how much computing time and storage an algorithm will require. How well an algorithm does performs in the best case, worst case, average case. Kinds of Analyses Worst-case: (usually) T(n) = maximum time of algorithm on any input of size n. Average-case: (sometimes) T(n) = expected time of algorithm over all inputs of size n. Need assumption of statistical distribution of inputs. Best-case: (Never) Cheat with a slow algorithm that works fast on some input. 5. Testing algorithm It consists of two phases: Debugging and profiling Debugging is the process of executing programs on sample data to determine if any faulty results occur. Profiling is the process of executing a correct programs on data sets and measuring the time and space it takes to compute the results. II. ANALYSIS OF ALGORITHM Analyzing an algorithm has come to mean predicting the resources that the algorithm requires. Occasionally, resources such as memory, communication bandwidth, or logic gates are of primary concern, but most often it is computational time that we want to measure. Generally, by analyzing several candidate algorithms for a problem, a most efficient one can be easily identified. Such analysis may indicate more than one viable candidate, but several inferior algorithms are usually discarded in the process. Analysis predicting the resources that the algorithm requires, resources such as memory, communication bandwidth or computer hardware are of primary concern, but most often it is necessary to measure the computational time. By analyzing several candidate algorithms for a problem, a most efficient one can be easily identified and others are discarded in the process. The main reasons for analyzing algorithms are It is an intellectual activity It is a challenging one to predict the future by narrowing the predictions to algorithms. Computer science attracts many people who enjoy being efficiency experts. Structural Programming Model Niklaus Wirth started that any algorithm could be written with only three programming constructs Sequence, Selection, Loop
  9. 9. 9 The implementation of these constructs relies on the implementation language like C++ language. Sequence is a series of statement that do not alter the execution path within an algorithm. Selection statements evaluate one or more alternatives. If alternatives are true, one path is taken. If alternatives are false, a different path is taken. Loop Iterates a block of code. Usually the condition is evaluate before the Body of the loop is executed. If the condition is true, the body is executed . If the condition is false, the loop terminates. ANALYSIS OF INSERTION SORT :  The time taken by the Insertion Sort procedure depends on the input: sorting a thousand numbers takes longer than sorting three numbers.  Insertion sort can take different amounts of time to sort two input sequences of the same size depending on how nearly sorted they already are.  In general, the time taken by an algorithm grows with the size of the input, so it is traditional to describe the running time of a program as a function of the size of its input.  To do so, we need to define the terms "running time" and "size of input" more carefully.  The best notion for input size depends on the problem being studied.  For many problems, such-as sorting or computing discrete Fourier transforms, the most natural measure is the number a/items in the input.  For example, the array size n for sorting. For many other problems, such as multiplying two integers, the best measure of input size is the total number of bits needed to represent the input in ordinary binary notation.  The running time of an algorithm on a particular input is the number of primitive operations or "steps" executed.  We start by presenting the INSERTION-SORT procedure with the time "cost" of each statement and the number of times each statement is executed. For each j = 2,3, ... , n, where n = length[A], we let tj be the number of times the while loop test in line 5 is executed for that value of j.  We assume that comments are not executable statements, and so they take no time. INSERTION-SORT(A) cost times 1 for j +- 2 to length[A] C1 n 2 do key +- A[j] C2 n - 1 3 l> Insert A[j] into the sorted l> sequence A[l .. j - 1]. 0 n - 1 4 i+-j-l C4 n-l 5 while i > a and A[i] > key Cs L;=2 tj 6 do A[i + 1] +- A[i] C6 L;=2(tj - 1) 7 i+-i-l C7 L;=2(tj - 1)
  10. 10. 10 8 A[i + 1] +- key C8 n - 1 The running time of the algorithm is the sum of running times for each statement executed; a statement that takes Ci steps to execute and is executed n times will contribute Ci n to the total running time. To compute T(n), the running time of INSERTION-SORT, we sum the products of the cost and times columns, obtaining n n n T(n) =c1n+c2 (n-1)+c4(n-1)+ c5∑ tj + c6 ∑ (tj-1) +c7 ∑ (tj-1) + c8 (n-1) j=2 j=2 j=2 Even for inputs of a given size, an algorithm's running time may depend on which input of that size is given. For example, in INSERTION-SORT, the best case occurs if the array is already sorted. For each j = 2,3, ... , n, we then find that A[i]≤key in line 5 when i has its initial value of j - 1. Thus tj = 1 for j = 2,3, ... , n, and the best-case running time is T(n) = cln + c2(n - 1) + c4(n - 1) + c5(n - 1) + c8(n - 1) = (cl + c2 + c4 + c5 + c8)n - (c2 + c4 + c5 + c8) This running time can be expressed as an+b for constants a and b that depend on the statement costs ci. It is thus a linear function of n. If the array is in reverse sorted order the worst case results. Compare each element A[j] with each element in the entire sorted sub array A[1…j-1] and so tj=j for j=2,3,…n n ∑ j=n (n+1)/2 -1 j=2 and n ∑ (j-1)=n (n+1)/2 j=2 T(n)=c1n+c2(n-1)+c5(n(n-1)/2 -1)+c6((n(n-1)/2 + c7((n(n-1)/2+c8(n-1) = (c5/2+c6/2+c7/2) n2 +(c1+c2+c4+c5/2-c6/2-c7/2+c8) n-(c2+c4+c5+c8) This worst case running time can be expressed as an2 = bn+c for constants a, b, c that again depend on the statement costs ci; it is thus a quadratic function of n
  11. 11. 11 Worst case and Average case Analysis In the analysis of Insertion sort, the best case in which the input array was already sorted, and the worst case, in which the input array was reverse sorted. To find only the worst case running time, that is, the longest running time for any input of size n. Three reasons are The worst case running time of an algorithm is an upper bound on the running time for any input. Knowing it gives us the guarantee that the algorithm will never take any longer. For some algorithms, the worst case occurs fairly often. For example, in searching a database for a particular piece of information, the searching algorithm worst case will often occur when the information is not present in the database. The average case is often roughly as bad as the worst case. Suppose that we randomly choose n numbers and apply insertion sort. How long it takes to determine where in sub array A[1…j-1] to insert element A[j]? On average half the elements in A[1..j-1] are less than A[j], and half the elements are greater. On average, we check half of the sub array A[1..j-1], so tj=j/2. Resulting average case running time, it turns out to be a quadratic function of the input size, just like the worst case running time. One problem with performing an average case analysis, however is that it may not be apparent what constitutes an average input for a particular problem. DESIGNING ALGORITHMS  There are many ways to design algorithms. Insertion sort uses an incrementa1 approach: having sorted the sub array A [I .. j - 1], we insert the single element A[j] into its proper place, yielding the sorted sub array A[l .. j].  In this section, we examine an alternative design approach, known as "divide- and-conquer."  We shall use divide-and-conquer to design a sorting algorithm whose worst-case running time is much less than that of insertion Sort.  One advantage of divide-and-conquer algorithms is that their running times are often easily determined using techniques Divide and Conquer approach: Many useful algorithms are recursive in structure: to solve a given problem, they call themselves recursively one or more times to deal with closely related sub problems. These algorithms typically follow a divide-and-conquer approach: they break the problem into several sub problems that are similar to the original problem but smaller in size, solve the sub problems recursively, and then combine these solutions to create a solution to the original problem. The divide-and-conquer paradigm involves three steps at each level of the recursion:
  12. 12. 12 Divide the problem into a number of sub problems. Conquer the sub problems by solving them recursively. If the sub problem sizes are small enough, however, just solve the sub problems in a straightforward manner. Combine the solutions to the sub problems into the solution for the original problem. EXAMPLE - MERGE SORT The merge sort algorithm closely follows the divide-and-conquer paradigm. Intuitively, it operates as follows. Divide: Divide the n-element sequence to be sorted into two subsequences of nl2 elements each. Conquer: Sort the two subsequences recursively using merge sort. Combine: Merge the two sorted subsequences to produce the sorted answer. We note that the recursion "bottoms out" when the sequence to be sorted has length I, in which case there is no work to be done, since every sequence of length I is already in sorted order. The key operation of the merge sort algorithm is the merging of two sorted sequences in the "combine" step. To perform the merging, we use an auxiliary procedure MERGE (A,p, q, r), where A is an array and p, q, and r are indices numbering elements of the array such that p :S q < r. The procedure assumes that the sub arrays A(p .. q] and A[q + I .. r] are in sorted order. It merges them to form a single sorted sub array that replaces the current sub array A(p .. r]. Although we leave the pseudo code as an exercise it is easy to imagine a MERGE procedure that takes time 8(n), where n = r - p + 1 is the number of elements being merged. Returning to our card playing motif, suppose we have two piles of cards face up on a table. Each pile is sorted, with the smallest cards on top. We wish to merge the two lines into a single sorted output pile, which is to be face down on the table input pile and place it face down onto the output pile. Computationally, each basic step takes constant time, since we are checking just two top cards. Since we perform at most n basic steps, merging takes 8(n) time. We can now use the MERGE procedure as a subroutine in the merge sort algorithm. The procedure MERGE-SORT (A,p, r) sorts the elements in the sub array A[p .. r]. If p, r, the sub array has at most one element and is therefore already sorted. Otherwise, the divide step simply computes an index q that partitions A[p .. r] into two sub arrays: A[p .. q], containing rn/2l elements, and A[q + I .. r], containing In/2J elements.4 MERGE-SORT (A,p, r) I if p < r 2 then q <- l(p + r)/2J 3 MERGE-SORT(A,p, q) 4 MERGE-SORT(A,q + I,r) 5 MERGE(A,p,q,r) To sort the entire sequence A = (A[I],A[2], ... ,A[nJ), we call MERGESORT(A, 1,length[A]), where once again length[A] = n. If we look at the operation of the procedure bottom-up when n is a power of two, the al- gorithm consists of merging pairs of I-item sequences to form sorted sequences of length 2, merging pairs of sequences of length 2 to form sorted sequences of length 4, and so on, until two sequences of length n /2 are merged to form the final sorted sequence of length n
  13. 13. 13 When an algorithm contains a recursive call to itself, its running time can often be described by a recurrence equation or recurrence equation, which describes the overall running time on a problem of size n in terms of the running time on smaller inputs. We can then use mathematical tools to solve the recurrence and provide bounds on the performance of the algorithm. A recurrence for the running time of a divide-and-conquer algorithm is based on the three steps of the basic paradigm. As before, we let T(n) be the running time on a problem of size n. If the problem size is small enough, say n :: c for some constant c, the straightforward solution takes constant time, which we write as 8( I). suppose we divide the problem into sub problems, each of which is 1/ b the size of the original. If we take D( n) time to divide the problem into sub problems and C(11) time to combine the solutions to the sub problems into the solution to the original problem, we get the recurrence BEST CASE, WORST CASE AND AVG. CASE EFFICIENCIES Time efficiency – function in terms of n (input size) For some algorithm, the running time depends not only on input size n also on the individual elements. eg. Linear search, here we go for worst case/ best case and avg. case efficiency. We will mainly focus on worst-case analysis, but sometimes it is useful to do average one. Worst- / average- / best-case Worst-case running time of an algorithm – The longest running time for any input of size n – An upper bound on the running time for any input – Guarantee that the algorithm will never take longer – Sequential search for an item which is not present / present at the end of list. – Sort a set of numbers in increasing order; and the data is in decreasing order – The worst case can occur fairly often – Provides the expected running time Best-case running time – if the algorithm is executed, the fewest number of instructions are executed – takes shortest running time for any input of size n – Sequential search for an item which is present at beginning of the list. – sort a set of numbers in increasing order; and the data is already in increasing order
  14. 14. 14 Average-case running time – May be difficult to define what ―average‖ means, but gives the necessary details about an algo‘s behavior on a typical /random input. EXAMPLE: Sequential Search  A sequential search steps through the data sequentially until a match is found.  A sequential search is useful when the array is not sorted. The basic operation count for 1. Best case input c(n)=1 2. Worst case input c(n) Unsuccessful search --- n times Successful search (worst) ---n times 3. Avg.case input Here basic operation count is calculated as follows Assumptions: a) The probability of a successful search =p (0<=p<=1) b) The probability of the first match occuring in the ith position of list is same for every i, which is equal to p/n and the no of compare operations made by the algo. in such a situation is i. c) In case of unsuccessful search ,the no of comparisons made is n with the probability of such a search is (1-p) So c(n)=[1*p/n+2*p/n+…..i*p/n+….n*p/n]+n*(1-p) =p/n(1+2+….i+…n)+n(1-p) =p/n*n(n+1)/2+n(1-p) c(n)=p(n+1)/2+n(1-p) For successful search, p=1,c(n)=(n+1)/2 Unsuccessful search, p=0,c(n)=n
  15. 15. 15 ANALYSIS OF ALGORITHM USING DATA STRUCTURES: The analysis of algorithm is made considering both qualitative and quantitative aspects to get the solution that is economical in the use of computing and human resources which improves the performance of an algorithm. A good algorithm usually possesses the following qualities and capabilities.  They are simple but powerful and general solutions  They are user friendly  They can be easily updated  They are correct  They are able to be understood on a number of levels  They are economical in the use of computer time, storage and peripherals  They are independent to run on a particular computer  They can be used as subprocedures for other problems  The solution is pleasing and satisfying to its designer IV. COMPUTATIONAL COMPLEXITY Space Complexity The space complexity of an algorithm is the amount of memory it needs to run to completion [Core dumps = the most often encountered cause is ―memory leaks‖ – the amount of memory required larger than the memory available on a given system] Some algorithms may be more efficient if data completely loaded into memory 1. Need to look also at system limitations 2. E.g. Classify 2GB of text in various categories [politics, tourism, sport, natural disasters, etc.] – can I afford to load the entire collection? Time Complexity The time complexity of an algorithm is the amount of time it needs to run to completion Often more important than space complexity 1. space available (for computer programs!) tends to be larger and larger 2. time is still a problem for all of us Algorithms running time is an important issue Space Complexity The Space needed by each algorithm is the sum of the following components: 1. Instruction space 2. Data space 3. Environment stack space Instruction space
  16. 16. 16 The space needed to store the compiled version of the program instructions Data space The space needed to store all constant and variable values Environment stack space The space needed to store information to resume execution of partially completed functions The total space needed by an algorithm can be simply divided into two parts from the 3 components of space complexity 1. Fixed part 2. Variable part Fixed part A fixed part space is independent of the characteristics of the inputs and outputs. This part typically includes the instruction space, space for simple variables and fixed size component variables, space for constants and so on e.g. name of the data collection same size for classifying 2GB or 1MB of texts Variable part A variable part space needed by component variables whose, size is dependent on the particular problem instance being solved, the space needed by referenced variables and the recursion stack space. e.g. actual text load 2GB of text VS. load 1MB of text The space requirement S(P) of any algorithm or program P may be written as: S(P)=C+Sp (instance characteristics) C= Constant that denotes the fixed part of the space requirement Sp= Variable Component depends on the magnitude of the inputs to and outputs from the algorithm. Example void float sum (float* a, int n) { float s = 0; for(int i = 0; i<n; i++) {
  17. 17. 17 s+ = a[i]; } return s; } Space one word for n, one for a [passed by reference!], one for i  constant space! When memory was expensive we focused on making programs as space efficient as possible and developed schemes to make memory appear larger than it really was (virtual memory and memory paging schemes)Space complexity is still important in the field of embedded computing Time Complexity The time T(P) taken by a program P is T(P)-Compile time +Run(or)Execution time Compile time It does not depend on the instance characteristics. We assume that a compiled program will be run several times without recompilation Run time It depends on the instance characteristics denoted by tp. The tp(n) can be calculated by the following form of expression tp(n)= Ca ADD(n) + Cs SUB(n)+ Cm MUL(n)+ Cd DIV(n)+…. n = instance characteristics Ca, Cs, Cm, Cd = time needed for addition, subtraction, multiplication and division ADD, SUB, MUL, DIV= number of additions, subtractions, multiplications and divisions performed for the program p on the instance characteristics n. To find the value of tp(n) from the above expression is an impossible task, since the time needed for Ca, Cs, Cm, Cd is often depends on the numbers being involved in the operation. The value of tp(n) for any given n can be obtained experimentally, that is, the program typed, compiled and run on a particular machine, the execution time is physically clocked, and tp(n) is obtained. The value of tp(n) depends on some factors, such as system load, the number of other programs running on the computer at the time program p is run and so on. To overcome this disadvantage, count only the program steps, where the time is required by each step is relatively independent of the instant characteristics. A program step is defined as syntactically or semantically meaningful segment of a program that has an execution time that is independent of the instant characteristics. The program statements are classified into three steps 1. Comments-Zero step 2. Assignment statement-One step 3. Iterative statement-finite number of steps.
  18. 18. 18 V. AMORTIZED ANALYSIS In an amortized analysis, the time required to perform a sequence of data structure operations is averaged over all the operations performed. Amortized analysis can be used to show that the average cost of an operation is small, if one averages over a sequence of operations, even though a single operation might be expensive. Amortized analysis differs from average case analysis in that probability is not involved; an amortized analysis guarantees the average performance of each operation in the worst case. In the aggregate method of amortized analysis, we show that for all n, a sequence of n operations takes worst-case time T( n) in total. In the worst case, the average cost, or amortized cost, per operation is therefore T( n) / n. Note that this amortized cost applies to each operation, even when there are several types of operations in the sequence. The other two methods we shall study in this chapter, the accounting method and the potential method, may assign different amortized costs to different types of operations. Stack operations In our first example of the aggregate method, we analyze stacks that have been augmented with a new operation. Section II. J presented the two fundamental stack operations, each of which takes O( I) time: PUSH(S, x) pushes object x onto stack S. Pop(S) pops the top of stack S and returns the popped object. Since each of these operations runs in O( I ) time, let us consider the cost of each to be I. The total cost of a sequence of n PUSH and POP operations is therefore n, and the actual running time for n operations is therefore 9(n). The situation becomes more interesting if we add the stack operation MULTIPOP(S,k), which removes the k top objects of stack S, or pops the entire stack if it contains less than k objects. In the following pseudo code, the operation STACK-EMPTY returns TRUE if there are no objects currently on the stack, and FALSE otherwise. MULTIPOP(S, k) 1 while not STACK-EMPTY(S) and k 1= 0 2 do Pop(S) 3 k..-k-l
  19. 19. 19 VI. ASYMPTOTIC NOTATION Complexity analysis: rate at which storage or time grows as a function of the problem size Asymptotic analysis: describes the inherent complexity of a program, independent of machine and compiler Idea: as problem size grows, the complexity can be described as a simple proportionality to some known function. A) Big Oh (O)-Upper Bound This notation is used to define the worst case running time of an algorithm and concerned with very large values of n. f(n) = O(g(n)) iff f(n) < cg(n) for some constants c and n0, and all n > n0 B) Big Omega (Ω )-Lower Bound This notation is used to describe the best case running time of algorithms and concerned with large values of n f(n) = Ω(g(n)) iff f(n) > cg(n) for some constants c and n0, and all n > n0 C) Big Theta (Θ)-Two-way Bound This notation is used to describe the average case running time of algorithms and concerned with very large values of n f(n) = Θ (g(n)) iff c1g(n) < f(n) < c2g(n) for some constants c1, c2, and n0, and all n > n0 D) Little Oh (o)-Only Upper Bound This notation is used to describe the worst case analysis of algorithms and concerned with small values of n f(n) = o(g(n)) iff f(n) = O(g(n)) and f(n)≠ Ω(g(n)) To compare and rank order of growth of algorithm. 3 notaions O, Ω , Informal definition. Let t(n) and g(n) be any non negative funs defined on the set of natural nos. t(n)- algo‘s running time g(n) – simple function to compare the count with. i) O(g(n)) is the set of all fns with a smaller or same order of growth as g(n)
  20. 20. 20 Eg. n E O(n2 ) n3 ^E O(n2 ) 100n+5 E O(n2 ) n4 +n+1^E O(n2 ) 100n+5 E O(n) 1/2n(n-1) E O(n2 ) ii) Ω(g(n)) stands for set of all fns with a larger or same order of growth as g(n) n3 E Ω(n2 ) 1/2n(n-1) E Ω(n2 ) 100n+5 ^E Ω(n2 ) iii) (g(n)) stands for set of all fns with the same order of growth as g(n) n3 ^ E (n2 ) an2 +bn+c E (n2 ) 100n+5 E (n) Big Oh f(N) = O(g(N)) There are positive constants c and n0 such that o f(N) c g(N) when N n0 The growth rate of f(N) is less than or equal to the growth rate of g(N) g(N) is an upper bound on f(N) o We write f(n) = O(g(n)) if there are positive constants n0 and c such that to the right of n0, the value of f(n) always lies on or below cg(n). Meaning: For all data sets big enough (i.e., n>n0), the algorithm always executes in less than cf(n) steps in [best, average, worst] case. The idea is to establish a relative order among functions for large n c , n0 > 0 such that f(N) c g(N) when N n0
  21. 21. 21 f(N) grows no faster than g(N) for ―large‖ N Big O Rules • If is f(n) a polynomial of degree d, then f(n) is • O(d), i.e., 1. Drop lower-order terms 2. Drop constant factors • Use the smallest possible class of functions Say ―2n is O(n)‖ instead of ―2n is O(n2)‖ • Use the simplest expression of the class Say ―3n + 5 is O(n)‖ instead of ―3n + 5 is O(3n)‖ Big-Oh: example • Let f(N) = 2N2 . Then – f(N) = O(N4 ) – f(N) = O(N3 ) – f(N) = O(N2 ) (best answer, asymptotically tight) • N2 / 2 – 3N = O(N2 ) • 1 + 4N = O(N) • 7N2 + 10N + 3 = O(N2 ) = O(N3 ) Big-Omega • f(N) = (g(N)) • There are positive constants c and n0 such that f(N) c g(N) when N n0 • The growth rate of f(N) is greater than or equal to the growth rate of g(N). • c , n0 > 0 such that f(N) c g(N) when N n0 • f(N) grows no slower than g(N) for ―large‖ N Big-Omega: example
  22. 22. 22 • Let f(N) = 2N2 . Then – f(N) = (N) – f(N) = (N2 ) (best answer) Big-Theta • f(N) = (g(N)) iff f(N) = O(g(N)) and f(N) = (g(N)) • The growth rate of f(N) equals the growth rate of g(N) • f(n) is Θ(g(n)) if there are constants c‘ > 0 and c‘‘ > 0 and an integer constant n0 ≥ 1 such that c‘•g(n) ≤ f(n) ≤ c‘‘•g(n) for n ≥ n0 • Big-Theta means the bound is the tightest possible. • the growth rate of f(N) is the same as the growth rate of g(N) Big-Theta rules • Example: Let f(N)=N2 , g(N)=2N2 – Since f(N) = O(g(N)) and f(N) = (g(N)), thus f(N) = (g(N)). • If T(N) is a polynomial of degree k, then T(N) = (Nk ). • For logarithmic functions, T(logm N) = (log N) Mathematical Expression Relative Rates of Growth T(n) = O(F(n)) Growth of T(n) <= growth of F(n)
  23. 23. 23 T(n) = Ω(F(n)) Growth of T(n) >= growth of F(n) T(n) = Θ(F(n)) Growth of T(n) = growth of F(n) T(n) = o(F(n)) Growth of T(n) < growth of F(n) Computation of step count using asymptotic notation Asymptotic complexity can be determined easily without determining the exact step count. This is done by first determining the asymptotic complexity of each statement in the algorithm and then adds these complexities to derive the total step count. Question Bank UNIT I - PROBLEM SOLVING PART – A (2 MARKS) 1. Define Modularity. 2. What do you mean by top down design? 3. What is meant by algorithm? What are its measures? 4. Give any four algorithmic techniques. 5. Write an algorithm to find the factorial of a given number 6. List the types of control structures 7. Define the top down design strategy 8. Define the worst case & average case complexities of an algorithm 9. What is meant by modular approach? 10. What is divide & conquer strategy? 11. What is dynamic programming? 12. What is program testing? 13. Define program verification 14. What is input/output assertion? 15. Define symbolic execution 16. Write the steps to verify a program segment with loops 17. What is CPU time? 18. Write at least five qualities & capabilities of a good algorithm 19. Write an algorithm to exchange the values of two variables 20. Write an algorithm to find N factorial (written as n!) where n>=0. PART- B (16 MARKS) 1. Explain Top down design in detail. 2. (a) Explain in detail the types on analysis that can be performed on an algorithm (8) (b) Write an algorithm to perform matrix multiplication algorithm and analyze the same (8) 3. Design an algorithm to evaluate the function sin(x) as defined by the infinite series expansion sin(x) = x/1!-x3/3! +x5/5!-x7/7! +…… 4. Write an algorithm to generate and print the first n terms of the Fibonacci series where n>=1 the first few terms are 0, 1, 1, 2, 3, 5, 8, 13. 5. Design an algorithm that accepts a positive integer and reverse the order of its digits. 6. Explain the Base conversion algorithm to convert a decimal integer to its corresponding octal representation
  24. 24. 24 UNIT II - FUNDAMENTALS OF DATA STRUCTURES Arrays – Structures – Stacks – Definition and examples – Representing Stacks –Queues and Lists – Queue and its Representation – Applications of Stack – Queue and Linked Lists.
  25. 25. 25 Unit: II - FUNDAMENTALS OF DATA STRUCTURES I. ARRAYS Array is a finite ordered set of homogeneous elements. Array size may of large or small, but it must exist. An array must contains collection of elements of same datatype. Array declaration in C is given below. int a[100]; Here array name is ‗a‘, and size is 100. Each element is represented by its index, which starts from 0. For example 1st element index is 0, 2nd element index is 1 and 100th element index is 99. The two basic operations that access an array are extraction and storing. The extraction operation is a function that accepts an array element by using array name and index. The storing operation of value x in the index i is, a[i] = x; The smallest element of an array`s index is called its lower bound and in c is always 0, and the highest element is called the upper bound. The number of elements in an array is called range. If lower bound is represented by ―lower‖ and upper bound is represented by ―upper‖, then range = upper – lower + 1. For example for array a, lower bound is 0, upper bound is 99 and range is 100. An important feature of a C array is that neither the upper bound nor the lower bound may be changed during a program`s execution. The lower bound is always fixed at 0, and the upper bound is fixed at the time the program is written. One very useful technique is to declare a bound as a constant identifier, so that work required to modify the size of an array is minimized. For example, consider the following program int a[100]; for(i = 0; i <100; a[i++] = 0); To change the array to a larger (or smaller) size, the constant 100 must be changed in two pieces, once in the declarations and once in the for statement. Consider the following equivalent alternative, #define NUMELTS 100 int a[NUMELTS];
  26. 26. 26 for(i = 0; i < NUMELTS; a[i++] = 0); Now only a single change in the constant definition is needed to change the upper bound. One Dimensional Array A one-dimensional array is used when it is necessary to keep a larger number of items in memory and reference all the items in a uniform manner. Consider an application to read 100 integers, and find its average. #define NUMELTS 100 aver() { int num[NUMELTS]; int i; int total; float avg; total = 0; for(i = 0; i < NUMELTS; i++) { Scanf(―%d‖, &num[i]); Total += num[i]; } avg = total / NUMELTS; printf(―Average = %f‖, avg); } The first statement (int b[100];) reseves 100 successive memory locations, each large enough to contain a single integer. The address of the first of these locations is called the base address of the array b. Two-Dimensional array
  27. 27. 27 Two-Dimensional array is an array of another array. For example int a[3][5]; This represents a array containing three elements. Each of these elements is itself an array containing five elements. This can be represented in figure given below. Total 15 (3 x 5) elements can be stored in this array. Each element can be accessed by its corresponding row index and array index. Suppose if you want to access the first cell in the second row means, just use a[1][0]. Like wise if you want to access second cell in third row, use a[2][1]. Nested looping statement is used to access each element efficiently, sample code to read values and full it in this array is given below. for(i = 0; i < 3; i++) { for(j = 0; j < 5; j++) scanf(―%d‖, &a[i][j]); } Multi-Dimensional array C allows developers to declare array of dimensions more than two also. Three-Dimensional array declaration is given below. int [3][2][5]; This can be accessed using three nested looping statements. Developer can also declare more than three dimensional arrays. II. STRUCTURES Row 0 Row 1 Row 2 Col 0 Col 1 Col 2 Col 3 Col 4
  28. 28. 28 A structure is a group of items in which each item is identified by its own identifier. In programming language, a structure is called a ―record‖ and a member is called a ―field‖. Consider the following structure declaration, struct { Char first[10]; Char midinit; Char last[20]; }sname, ename; This declaration creates two structure variables, sname and ename, each of which contains three members: first, midinit and last. Two of the members are character strings, and one is single character. Declaration of the structure can be also in another format given below struct nametype { Char first[10]; Char midinit; Char last[20]; }; Struct nametype sname, ename; The above definition creates a structure tag nametype containing three members. Once a structure tag has been defined, variables sname and ename can be declared. An alternative method for structure tag assigning is use of typedef definintion in C, which is given below typedef struct { Char first[10]; Char midinit; Char last[20]; } nametype; nametype sname, ename;
  29. 29. 29 Structure variable sname contains three members and ename contains a separate three members. Each member of a structure variable can be accessed using a dot(.) operator. Consider a structure given below. struct data { int a; float b; char c; }; int main() { struct data x,y; printf(―nEnter the values for first variablen‖); scanf(―%d%f%c‖, &x.a,&x.b,&x.c); printf(―nEnter the values for second variablen‖); scanf(―%d%f%c‖, &y.a,&y.b,&y.c); return 0; } Structure variable can also be an array variable. Looping statement is used to get input for that structure array variable. Sample code is given below, int main() { struct data x[5]; int i; for(i = 0; i < 5; i++) { printf(―nEnter the values for %d variablen‖, (i+1)); scanf(―%d%f%c‖, &x[i].a, &x[i].b, &x[i].c); }
  30. 30. 30 return 0; } STACK AND QUEUE Stacks and queues are used to represent sequence of elements which can be modified by insertion and deletion. Both stacks and queues can be implemented efficiently as arrays or as linked lists. III. STACK A stack is a list with the restriction that inserts and deletes can be performed in only one position, namely the end of the list called the top. The fundamental operations on a stack are push, which is equivalent to an insert, and pop, which deletes the most recently inserted element. The most recently inserted element can be examined prior to performing a pop by use of the top routine. Stacks are used often in processing tree-structured objects, in compilers (in processing nested structures), and is used in systems to implement recursion. Stacks are also known as LIFO. Stack model
  31. 31. 31 (Stack model: only the top element is accessible) REPRESENTATION OF STACK A) Implementation of Stack using array A stack K is most easily represented by an infinite array K[0], K[1], K[2],…. and an index TOP of type integer. The stack K consist of K[0],…K[TOP] elements. Element at index TOP (K[TOP]) is the top element of stack. Insertion of elements is called push and deletion of element is called pop. The following code explains the operation push(K,a) TOP = TOP + 1; K[TOP] = a; The following code explains the operation pop(K) if(TOP<0) then error; else X = K[TOP]; TOP = TOP – 1; end if Infinite array is not available. So use a finite array, of size n. In this case a push operation must check whether overflow occurs. B) Linked List Implementation of Stacks Stack can be implemented using a singly linked list. We perform a push by inserting at the front of the list. We perform a pop by deleting the element at the front of the list. A top operation merely examines the element at the front of the list, returning its value. Sometimes the pop and top operations are combined into one. Structure definition is given below, typedef struct node *node_ptr; struct node { element_type element; node_ptr next;
  32. 32. 32 }; typedef node_ptr STACK; Routine to test whether a stack is empty-linked list implementation is given below, int is_empty( STACK S ) { return( S->next == NULL ); } We merely create a header node; make_null sets the nextpointer to NULL. Routine to create and empty stack-linked list implementation are given below, STACK S; S = (STACK) malloc( sizeof( struct node ) ); if( S == NULL ) fatal_error("Out of space!!!"); return S; } Void make_null( STACK S ) { if( S != NULL ) S->next = NULL; else error("Must use create_stack first"); } The push is implemented as an insertion into the front of a linked list, where the front of the list serves as the top of the stack. Routine to push onto a stack-linked list implementation is given below, push( element_type x, STACK S ) { node_ptr tmp_cell; tmp_cell = (node_ptr) malloc( sizeof ( struct node ) ); if( tmp_cell == NULL ) fatal_error("Out of space!!!"); else { tmp_cell->element = x;
  33. 33. 33 tmp_cell->next = S->next; S->next = tmp_cell; } } The top is performed by examining the element in the first position of the list. Routine to return top element in a stack--linked list implementation is given below, element_type top( STACK S ) { if( is_empty( S ) ) error("Empty stack"); else return S->next->element; } Routine to pop from a stack--linked list implementation is given below, Void pop( STACK S ) { node_ptr first_cell; if( is_empty( S ) ) error("Empty stack"); else { first_cell = S->next; S->next = S->next->next; free( first_cell ); } } One problem that affects the efficiency of implementing stacks is error testing. Our linked list implementation carefully checked for errors. A pop on an empty stack or a pushon a full stack will overflow the array bounds and cause a crash. APPLICATIONS OF STACKS
  34. 34. 34 A) Balancing Symbols Every brace, bracket, and parenthesis must correspond to their left counterparts. The sequence [()] is legal, but [(]) is wrong. It is easy to check these things using stack. Just check for balancing of parentheses, brackets, and braces and ignore any other character that appears. Make an empty stack. Read characters until end of file. If the character is an open anything, push it onto the stack. If it is a close anything, then if the stack is empty report an error. Otherwise, pop the stack. If the symbol popped is not the corresponding opening symbol, then report an error. At end of file, if the stack is not empty report an error. B) Postfix Expressions Suppose we have a pocket calculator and would like to compute the cost of a shopping trip. To do so, we add a list of numbers and multiply the result by 1.06; this computes the purchase price of some items with local sales tax added. If the items are 4.99, 5.99, and 6.99, then a natural way to enter this would be the sequence 4.99 + 5.99 + 6.99 * 1.06 = Depending on the calculator, this produces either the intended answer, 19.05, or the scientific answer, 18.39. Most simple four-function calculators will give the first answer, but better calculators know that multiplication has higher precedence than addition. On the other hand, some items are taxable and some are not, so if only the first and last items were actually taxable, then the sequence 4.99 * 1.06 + 5.99 + 6.99 * 1.06 = would give the correct answer (18.69) on a scientific calculator and the wrong answer (19.37) on a simple calculator. A scientific calculator generally comes with parentheses, so we can always get the right answer by parenthesizing, but with a simple calculator we need to remember intermediate results. A typical evaluation sequence for this example might be to multiply 4.99 and 1.06, saving this answer as a1. We then add 5.99 and a1, saving the result in a1. We multiply 6.99 and 1.06, saving the answer in a2, and finish by adding al and a2, leaving the final answer in al. We can write this sequence of operations as follows: 4.99 1.06 * 5.99 + 6.99 1.06 * + This notation is known as postfix or reverse Polish notation. For instance, the postfix expression 6 5 2 3 + 8 * + 3 + *
  35. 35. 35 is evaluated as follows: The first four symbols are placed on the stack. The resulting stack is Next a '+' is read, so 3 and 2 are popped from the stack and their sum, 5, is pushed. Next 8 is pushed. Now a '*' is seen, so 8 and 5 are popped as 8 * 5 = 40 is pushed. Next a '+' is seen, so 40 and 5 are popped and 40 + 5 = 45 is pushed. Now, 3 is pushed. Next '+' pops 3 and 45 and pushes 45 + 3 = 48.
  36. 36. 36 Finally, a '*' is seen and 48 and 6 are popped, the result 6 * 48 = 288 is pushed. The time to evaluate a postfix expression is O(n), because processing each element in the input consists of stack operations and thus takes constant time. The algorithm to do so is very simple. Notice that when an expression is given in postfix notation, there is no need to know any precedence rules; this is an obvious advantage. C) Infix to Postfix Conversion Not only can a stack be used to evaluate a postfix expression, but we can also use a stack to convert an expression in standard form (otherwise known as infix) into postfix. Suppose we want to convert the infix expression a + b * c + ( d * e + f ) * g into postfix. A correct answer is a b c * + d e * f + g * +. When an operand is read, it is immediately placed onto the output. Operators are not immediately output, so they must be saved somewhere. The correct thing to do is to place operators that have been seen, but not placed on the output, onto the stack. We will also stack left parentheses when they are encountered. We start with an initially empty stack. If we see a right parenthesis, then we pop the stack, writing symbols until we encounter a (corresponding) left parenthesis, which is popped but not output. If we see any other symbol ('+','*', '(' ), then we pop entries from the stack until we find an entry of lower priority. One exception is that we never remove a '(' from the stack except when processing a ')'. For the purposes of this operation, '+' has lowest priority and '(' highest. When the popping is done, we push the operand onto the stack. Finally, if we read the end of input, we pop the stack until it is empty, writing symbols onto the output. To see how this algorithm performs, we will convert the infix expression above into its postfix form. First, the symbol ais read, so it is passed through to the output. Then '+' is read and
  37. 37. 37 pushed onto the stack. Next b is read and passed through to the output. The state of affairs at this juncture is as follows: Next a '*' is read. The top entry on the operator stack has lower precedence than '*', so nothing is output and '*' is put on the stack. Next, c is read and output. Thus far, we have The next symbol is a '+'. Checking the stack, we find that we will pop a '*' and place it on the output, pop the other '+', which is not of lower but equal priority, on the stack, and then push the '+'. The next symbol read is an '(', which, being of highest precedence, is placed on the stack. Then d is read and output. We continue by reading a '*'. Since open parentheses do not get removed except when a closed parenthesis is being processed, there is no output. Next, e is read and output. The next symbol read is a '+'. We pop and output '*' and then push '+'. Then we read and output .
  38. 38. 38 Now we read a ')', so the stack is emptied back to the '('. We output a '+'. We read a '*' next; it is pushed onto the stack. Then g is read and output. The input is now empty, so we pop and output symbols from the stack until it is empty. As before, this conversion requires only O(n) time and works in one pass through the input. IV) QUEUE Queues supports insertions (called enqueues) at one end (called the tail or rear) and deletions (called dequeues) from the other end (called the head or front). Queues are used in operating systems and networking to store a list of items that are waiting for some resource. Queues are also known as FIFO. Model of a queue ARRAY IMPLEMENTATION OF QUEUES Both the linked list and array implementations give fast O(1) running times for every operation. Array implementation of queue is given below
  39. 39. 39 For each queue data structure, keep an array, QUEUE[], and the positions q_front and q_rear, which represent the ends of the queue. Keep track of the number of elements that are actually in the queue, q_size. The cells that are blanks have undefined values in them. In particular, the first two cells have elements that used to be in the queue. To enqueue an element x, Increment q_size and q_rear, then setQUEUE[q_rear] = x. To dequeue an element, set the return value to QUEUE[q_front], decrementq_size, and then increment q_front. There is one potential problem with this implementation. After 10 enqueues, the queue appears to be full, since q_front is now 10, and the next enqueue would be in a nonexistent position. However, there might only be a few elements in the queue, because several elements may have already been dequeued. The simple solution is that whenever q_front or q_rear gets to the end of the array, it is wrapped around to the beginning. The following figure shows the queue during some operations. This is known as acircular array implementation.
  40. 40. 40 There are two warnings about the circular array implementation of queues. First, it is important to check the queue for emptiness, because a dequeue when the queue is empty will return an undefined value, silently. Secondly, some programmers use different ways of representing the front and rear of a queue. For instance, some do not use an entry to keep track of the size, because they rely on the base case that when the queue is empty, q_rear = q_front - 1. The size is computed implicitly by comparing q_rear and q_front. This is a very tricky way to go, because there are some special cases, so be very careful if you need to modify code written this way. If the size is not part of the structure, then if the array size isA_SIZE, the queue is full when there are A_SIZE -1 elements, since only A_SIZE different sizes can be differentiated, and one of these is 0. Type declarations for queue--array implementation is given below. struct queue_record {
  41. 41. 41 unsigned int q_max_size; /* Maximum # of elements */ /* until Q is full */ unsigned int q_front; unsigned int q_rear; unsigned int q_size; /* Current # of elements in Q */ element_type *q_array; }; typedef struct queue_record * QUEUE; Routine to test whether a queue is empty-array implementation, is given below. int is_empty( QUEUE Q ) { return( Q->q_size == 0 ); } Routine to make an empty queue-array implementation, is given below. Void make_null ( QUEUE Q ) { Q->q_size = 0; Q->q_front = 1; Q->q_rear = 0; } Routines to enqueue-array implementation, is given below Void enqueue( element_type x, QUEUE Q ) { if( is_full( Q ) ) error("Full queue"); else { Q->q_size++; Q->q_rear = succ( Q->q_rear, Q ); Q->q_array[ Q->q_rear ] = x; } } APPLICATION OF QUEUES There are several algorithms that use queues to give efficient running times.
  42. 42. 42 When jobs are submitted to a printer, they are arranged in order of arrival. Thus, essentially, jobs sent to a line printer are placed on a queue. In computer networks, there are many network setups of personal computers in which the disk is attached to one machine, known as the file server. Users on other machines are given access to files on a first-come first-served basis, so the data structure is a queue. Calls to large companies are generally placed on a queue when all operators are busy. Queues are mostly used in graph theory. V) LIST List is an abstract data type (ADT). A general list is of the form a1, a2, a3, . . . , an. a1, a2 are called keys or values of list. The size of this list is n. The list of size 0 is called null list. List can be implemented contiguously (array) or non-contiguously (linked list). List Operations A lot of operations are available to perform on the list ADT. Some popular operations are find – returns the position of the first occurrence of a key (value) insert – deletes key from the specified position in the list delete – inserts key from the specified position in the list find_kth – returns the element in some position print_list – displays all the keys in the list. make_null – makes the list as null list For example consider a list, 34, 12, 52, 16, 12, then find(52) – return 3 insert(x,4) – makes the list into 34, 12, 52, x, 16, 12 delete(3) – makes the list into 34, 12, x, 16, 12. Simple Array Implementation of Lists List can be implemented using an array. Even if the array is dynamically allocated, an estimate of the maximum size of the list is required. Usually this requires a high over-estimate, which wastes considerable space. This could be a serious limitation, especially if there are many lists of unknown size. Merits of List using array An array implementation allows print_list and find to be carried out in linear time, which is as good as can be expected, and the find_kth operation takes constant time.
  43. 43. 43 Demerits of List using array However, insertion and deletion are expensive. For example, inserting at position 0 (which amounts to making a new first element) requires first pushing the entire array down one spot to make room, whereas deleting the first element requires shifting all the elements in the list up one, so the worst case of these operations is O(n). On average, half the list needs to be moved for either operation, so linear time is still required. Merely building a list by n successive inserts would require quadratic time. Because the running time for insertions and deletions is so slow and the list size must be known in advance, simple arrays are generally not used to implement lists. Linked Lists The linked list consists of a series of structures, which are not necessarily adjacent in memory. Each structure contains the element variable and a pointer variable to a structure containing its successor. Element variable is used to store a key (value). A pointer variable is just a variable that contains the address where some other data is stored. This pointer variable is called as next pointer. A linked list Thus, if p is declared to be a pointer to a structure, then the value stored in p is interpreted as the location, in main memory, where a structure can be found. A field of that structure can be accessed by p->field_name. Consider a list contains five structures, which happen to reside in memory locations 1000, 800, 712, 992, and 692 respectively. The next pointer in the first structure has the value 800, which provides the indication of where the second structure is. The other structures each have a pointer that serves a similar purpose. Of course, in order to access this list, we need to know where the first cell can be found. A pointer variable can be used for this purpose. Linked list with actual pointer values
  44. 44. 44 To execute print_list(L) or find(L,key), we merely pass a pointer to the first element in the list and then traverse the list by following the next pointers. This operation is clearly linear- time, although the constant is likely to be larger than if an array implementation were used. The find_kth operation is no longer quite as efficient as an array implementation; find_kth(L,i) takes O(i) time and works by traversing down the list in the obvious manner. The delete command can be executed in one pointer change. The result of deleting the third element in the original list is shown below. Deletion from a linked list The insert command requires obtaining a new cell from the system by using an malloc call (more on this later) and then executing two pointer maneuvers. Insertion into a linked list Programming Details Keep a sentinel node, which is sometimes referred to as a header or dummy node. Our convention will be that the header is in position 0. Linked list with a header is given below. Type declarations for linked lists is given below. typedef struct node *node_ptr; struct node { element_type element; node_ptr next; };
  45. 45. 45 typedef node_ptr LIST; typedef node_ptr position; Function to test whether a linked list is empty int is_empty( LIST L ) { return( L->next == NULL ); } Empty list with header Function to test whether current position is the last in a linked list Int is_last( position p, LIST L ) { return( p->next == NULL ); } Find function returns the position of the element in the list of some element position find ( element_type x, LIST L ) { position p; p = L->next; while( (p != NULL) && (p->element != x) ) p = p->next; return p; } Our fourth routine will delete some element x in list L. We need to decide what to do if x occurs more than once or not at all. Our routine deletes the first occurrence of x and does nothing if x is not in the list. To do this, we find p, which is the cell prior to the one containing x, via a call to find_previous. void delete( element_type x, LIST L )
  46. 46. 46 { position p, tmp_cell; p = find_previous( x, L ); if( p->next != NULL ) /* Implicit assumption of header use */ { /* x is found: delete it */ tmp_cell = p->next; p->next = tmp_cell->next; /* bypass the cell to be deleted */ free( tmp_cell ); } } position find_previous( element_type x, LIST L ) { position p; p = L; while( (p->next != NULL) && (p->next->element != x) ) p = p->next; return p; } Insert routine allows us to pass an element to be inserted along with the list L and a position p. Our particular insertion routine will insert an element after the position implied by p. insert( element_type x, LIST L, position p ) { position tmp_cell; tmp_cell = (position) malloc( sizeof (struct node) ); if( tmp_cell == NULL ) fatal_error("Out of space!!!"); else { tmp_cell->element = x; tmp_cell->next = p->next; p->next = tmp_cell; } } To delete a list Void delete_list( LIST L ) {
  47. 47. 47 position p, tmp; p = L->next; /* header assumed */ L->next = NULL; while( p != NULL ) { tmp = p->next; free( p ); p = tmp; } } DOUBLY LINKED LISTS To traverse lists backwards add an extra field to the data structure, containing a pointer to the previous cell. The cost of this is an extra link, which adds to the space requirement and also doubles the cost of insertions and deletions because there are more pointers to fix. On the other hand, it simplifies deletion, because you no longer have to refer to a key by using a pointer to the previous cell. A doubly linked list CIRCULARLY LINKED LISTS A popular convention is to have the last cell keep a pointer back to the first. This can be done with or without a header (if the header is present, the last cell points to it), and can also be done with doubly linked lists (the first cell's previous pointer points to the last cell). A double circularly linked list Question Bank Unit II - LISTS, STACKS AND QUEUES PART – A (2 MARKS) 1. Define ADT. 2. Give the structure of Queue model. 3. What are the basic operations of Queue ADT? 4. What is Enqueue and Dequeue? 5. Give the applications of Queue. 6. What is the use of stack pointer? 7. What is an array? 8. Define ADT (Abstract Data Type).
  48. 48. 48 9. Swap two adjacent elements by adjusting only the pointers (and not the data) using singly linked list. 10. Define a queue model. 11. What are the advantages of doubly linked list over singly linked list? 12. Define a graph 13. What is a Queue? 14. What is a circularly linked list? 15. What is linear list? 16. How will you delete a node from a linked list? 17. What is linear pattern search? 18. What is recursive data structure? 19. What is doubly linked list? PART- B (16 MARKS) 1. Explain the implementation of stack using Linked List. 2. Explain Prefix, Infix and postfix expressions with example. 3. Explain the operations and the implementation of list ADT. 4. Give a procedure to convert an infix expression a+b*c+(d*e+f)*g to postfix notation 5. Design and implement an algorithm to search a linear ordered linked list for a given alphabetic key or name. 6. (a) What is a stack? Write down the procedure for implementing various stack operations(8) (b) Explain the various application of stack? (8) 7. (a) Given two sorted lists L1 and L2 write a procedure to compute L1_L2 using only the basic operations (8) (b) Write a routine to insert an element in a linked list (8) 8. What is a queue? Write an algorithm to implement queue with example.
  49. 49. 49 UNIT III – TREES Binary Trees – Operations on Binary Tree Representations – Node Representation –Internal and External Nodes – Implicit Array Representation – Binary Tree Traversal – Huffman Algorithm – Representing Lists as Binary Trees – Sorting and Searching Techniques – Tree Searching – Hashing
  50. 50. 50 Unit: III TREES TREES A tree is a finite set of one or more nodes such that there is a specially designated node called the root, and zero or more non empty sub trees T1, T2…TK, each of whose roots are connected by a directed edge from Root R. Fig: Tree PRELIMINARIES Root A node which doesn‘t have a parent. In the above tree, the root is A. Node Item of Information Leaf A node which doesn‘t have children is called leaf or Terminal node. Here B, K, L, G, H, M, J are leafs. Siblings Children of the same parents are said to be siblings, Here B, C, D, E are siblings, F, G are siblings. Similarly, I, J, K, L are Siblings. Path A path from node n1to nk is defined as a sequence of nodes n1, n2, n3 ….nk such that n1 is the parent of ni+1. There is exactly only one path from each node to root. In fig path from A to L is A, C, F, L where A is the parent for C, C is the parent of F and F is the parent of L. Length The length is defined as the number of edges on the path. In fig the length for the path A to L is 3. A B C D E F G F I J K L
  51. 51. 51 Degree The number of sub trees of a node is called its degree. Degree of A is 4 Degree of C is 2 Degree of D is 1 Degree of H is 0 The degree of the tree is the maximum degree of any node in the tree In fig the degree of the tree is 4. Level The level of a node is defined by initially letting the root be at level one, if a node is at level L then it as children are at level L+1 Level of A is 1 Level of A, B, C, D is 2 Level of F, G, H, I, J is 3 Level of K, L, M is 4 Depth For any node n, the depth of n is the length of the unique path from root to n. The depth of the root is zero In fig Depth of node F is 2 Depth of node L is 3 Height For any node n, the height of the node n is the length of the longest path from n to the leaf. The height of the leaf is zero In fig Height of node F is 1 Height of L is 0 II. BINARY TREES A binary tree is a special form of a tree. A binary tree is more important and frequently used in various applications. A T (Binary tree) is defined as, T is empty or T contains a specially designated node called the root of T, and the remaining nodes of T from two disjoint binary trees T1and T2 which are called left-sub tree and the right sub- tree respectively.
  52. 52. 52 Fig: A sample binary tree with 11 nodes Two possible situations of a binary tree are (a) Full binary tree (b) Complete Binary tree Full binary tree A binary tree is a full binary tree, if it contains maximum possible number of nodes in all level. The full binary tree of height 4 Fig: Full Binary tree of height 4 Complete binary tree A binary tree is said to be a complete binary tree, if all its level, except possibly the last level, have the maximum number of possible nodes, all the nodes at the last level appear as far left as possible. A complete binary tree of height 4 Fig: A complete binary tree of height 4 III.REPRESENTATION OF BINARY TREE Two common methods used for representing this structure 1.Linear or sequential representation. (Using an array) 2.Linked representation (Using Pointers)
  53. 53. 53 A. Linear Representation of a Binary tree In this representation, the nodes are stored level by level, starting from the zero level where only root node is present. Root node is stored in the first memory location. some rules to decide the location of any of a tree in the array. The root node is at location 1. For any node with index i, 1< i≤ n o PARENT(i)=[i/2] when i=1, there is no parent o L CHILD(i)=2*I If 2*i>n, then I has no left child o R CHILD(i)=2*i+1 If 2*i+1>n, then I has no right child Consider a binary tree for the following expression (A-B)+C*(D/E) Fig: Binary Tree The representation of the same Binary tree using array is shown in fig below A full Binary tree and the index of its various nodes when stored in an array is shown in Fig below B. Linked representation of Binary Tree When we inserting a new node or deleting a node in a linear representation, We require data movement up and down in the array then it will take excessive amount of processing time. Linear representation of binary trees has a number of overheads. All these overheads are taken care of linked representation Structure of a node in linked rep: DATA
  54. 54. 54 RC LC RC-Right Child LC=Left Child Here LC & RC are two Link fields to store the address of left child and right child of a node. DATA is the information of the node. The tree with 9 nodes are represented as: Fig: Binary Tree OPERATIONS ON BINARY TREES There are number of primitive operations that can be applied to a binary tree. If p is a pointer to a node and of a binary tree, the function info(p) returns the contents of nd. The functions left(p), right(p), father(p), and brother(p) return pointers to the left son of nd, the right son of nd, the father of nd, and the brother of nd, respectively. These functions return the null pointer if nd has no left son, right son, father, brother. Finally, the logical functions isleft(p) and isright(p) return the value true if nd is a left or right son, respectively, of some other node in the the tree, and false otherwise. Note that the functions isleft(p),isright(p), and brother(p) can be implemented using the functions left(p),right(p) and father(p). Example isleft may be implemented as Q=father(p); if (q==null) return(false); if (left(q)==p) return(true); return(false); or even simpler, asfather(p) && p== left(father(p)).isright may be implemented in a similar manner, or by calling isleft.brother(p) may be implemented using isleft or isright as if (father(p)==null) return(null); if (isleft(p))
  55. 55. 55 return(right(father(p))); return(left(father(p)); In constructing a binary tree, the operations maketree, setleft and setright are useful. Maketree(x) creates a new binary tree consisting of a single node with information field x and returns a pointer to that node. Setleft(p,x) accepts a pointer p to a binary node with no left son. It creates a new left son of node(p) with information field x.setright(p,x) is analogous to setleft except that it creates a right son of node(p). Make_Empty This operation is mainly for initialization. Some programmers prefer to initialize the first element as a one-node tree, but our implementation follows the recursive definition of trees more closely. It is also a simple routine, as evidenced below template <class Etype> void Binary_Search _ Tree<Etype>:: Make_Empty (Tree_Mode<Etype> * & T) { if (T!= NULL) { Make_Empty( T-> Left); Make _ Empty( T -> Right) ; T = NULL; } } Find This operation generally requires returning a pointer to the node in tree T that has key X or NULL if there is no such node. The protected Find routine does The public routine then returns nonzero if the Find succeeded, and sets Last_Find. If the find failed, Zero is returned, and Last_Find point to NULL. The structures tree makes this simple. template <class type> Tree_Node<Etype> Binary _Search_Tree<EType>:: Find (Const Etype & X,Tree-Node<EType>*T) const { if (T==NULL)
  56. 56. 56 return NULL; if (x<T-> Element) return Find(X,T->Left); else if (x>T->Element) return Find( X,T-> right); else return T; } Find_Min and Find_Max Internally, these routines return the position of the smallest and largest elements in the tree, respectively. Although returning the exact values of these elements seem more reasonable, this would be inconsistently with the Find operation. It is important that similar-looking operations do similar things. To perform a Find_Min start at the root and go left as long as there is a left child. The stopping point is the smallest element. The Find_Max routine is the same, except that branching is to the right child. The public interface is similar to that of the Find routine template <class Etype> Tree_Node<Etype> Binary _Search_Tree<EType>:: Find_Min (Tree_Node <Etype>* T) const { if (T==NULL) return NULL; else if (T->Left== NULL) return T; else return Find_Min(T->Left); }
  57. 57. 57 template <class Etype> void Binary _Search_Tree<EType>:: Find_Max (Tree_Node <Etype>* T) const { if (T!=NULL) while (T->Right!=NULL) T=T->Right; return T; } BINARY TREE REPRESENTATIONS Node Representation of Binary Trees Tree nodes may be implemented with array elements or as allocation of a dynamic variable. Each node contains info, left, right and father fields. The left, right and father fields of a node point to the node‘s left son, right son, and father respectively. Using the array implementation, #define NUMNODES 500 Struct nodetype { int info; int left; int right; int father; }; Struct nodetype node[NUMNODES]; Under this representations, the operation info(p),left(p),right(p), and father(p) are implemented by references to node[p].info, node[p].left, node[p].right and node[p].father respectively. To implement isleft and isright more efficiently, we include within each node an additional flag isleft. The value of this flag is TRUE if the node is a left son and FALSE otherwise. The root is uniquely identified by a NULL value(0) in its father field.
  58. 58. 58 Alternatively, the sign of the father field could be negative if the node is a left son or positive if it is a right son. The pointer to a node‘s father then given by the absolute value of the father field. The isleft or isright operations would then need only examine the sign of the father field. To implement brother(p) more efficiently, brother field is included in each node. Once the array of nodes is declared, create an available list by executing the following statements. int avail, I; { avail=1; for(i=0;i<NUMNODES;i++) node[i].left=i+1; node[NUMNODES-1].left=0; } Note that the available list is not a binary tree but a linear list whose nodes are linked together by the left field. Each node in a tree is taken from yhe available pool when needed and returned to the available pool when no longer in use. This representation is called the linked array representation of a binary tree. A node may be defined by Struct nodetype { int info; struct nodetype *left; struct nodetype *right; struct nodetype *father; }; typedef struct nodetype *NODEPTR; The operations info(p),left(p),right(p),and father(p) would be implemented by the references to p->info, p->left, p->right, and p->father respectively. An explicit available list is not needed. The routines getnode and freenode simply allocate and free nodes using the routines malloc and free. This representation is called the dynamic node representation of a binary tree.
  59. 59. 59 Both the linked array representation and the dynamic node representation are implementations of an abstract linked representation (also called node representation)in which implicit explicit pointers link together the nodes of a binary tree. The maketree function, which allocates a node and sets it as the root of a single-node binary tree, may be written as NODEPTR maketree(x) int x; { NODEPTR P; P=getnode(); p->info=x; p->left=NULL; p->right=NULL; return(p); } The routine setleft(x) sets a node with contents x as the left son of node(p): Setleft(p,x); NODEPTR P; int x; { if (p==NULL) printf(―void insertion‖); else if(p->left!=NULL) printf(―invalid insertionn‖); else p->left=maketree(x); } The routine setright(p, x) to create a right son of node(p) with contents x is similar. INTERNAL AND EXTERNAL NODES By definition leaf nodes have no sons. Thus in the linked representation of binary trees, left and right pointers are needed only in non-leaf nodes. Sometimes two separate set of nodes are used for non-leaves and leaves. Non-leaf nodes contain info, left and right fields and are allocated as dynamic records or as an array of records managed using an available list. Leaf
  60. 60. 60 nodes do not contain a left or right fields and are kept as a single info array that is allocated sequentially as needed. Alternatively they can be allocated as dynamic variables containing only an info value. Each node can also contain a father field, if necessary. When this distinction is made between non-leaf and leaf nodes, non-leaves are called internal nodes and leaves are called external nodes. IMPLICIT ARRAY REPRESENTATION OF BINARY TREES In general, the nodes n of an almost complete binary tree can be numbered from 1 to n, so that the number assigned a left son is twice the number assigned its father, and the number assigned a right son is 1 more than twice the number assigned its father. We can extend this implicit array representation of almost complete binary trees to an implicit array representation of binary trees generally. This can be done by identifying an almost complete binary tree that contains the binary tree being represented. The Fig (a) illustrates two binary trees, and Fig (b) illustrates the smallest almost complete binary trees that contain them. Finally Fig(c) illustrates the array representations of these almost complete binary trees, and by extension, of the original binary trees. The implicit array representation is also called the sequential representation, because it allows a tree to be implemented in a contiguous block of memory rather than via pointers connecting widely separated nodes. Under the sequential representation, an array element is allocated whether or not it serves to contain a node of a tree. Therefore, flag unused array elements as non-existent or null tree nodes. Fig (a) Two Binary trees A B C D E F G H I J K L M A B C D E GF
  61. 61. 61 Fig (b) Almost complete extensions 0 1 2 3 4 5 6 7 8 9 10 11 12 A B C D E F G 0 1 2 3 4 5 6 7 8 9 H I J K L M Fig(c) Array representations Example The program to find duplicate numbers in an input list, as well as the routines maketree and setleft, using the sequential representation of binary trees #define NUMNODES 500 Struct nodetype { int info; int used; } node[NUMNODES]; main() { int p, q, number; scanf(―%d‖,&number); maketree(number); H I J K L M
  62. 62. 62 while(scanf(―%d‖,&number)!=EOF) { p=q=0; while (q<NIMNODES && node[q].used && number!= node[p].info) { p=q; if (number<node[p].info) q=2*p+1; else q=2*p+2; } if (number==node[p].info) printf(―%d is a duplicaten‖, number); else if (number<node[p].info) setleft(p,number); else setright(p,number); } } maketree(x) int x; { int p; node[0].info=x; node[0].used=TRUE; for (p=1;p<NUMNODES;p++) node[p].used=FALSE; } setleft (p, x) int p,x; { int q; q=2*p+1;
  63. 63. 63 if (q>=NUMNODES) error (―array overflow‖); else if (node[q].used) error (―invalid insertion‖); else { node[q].info=x; node[q].used=TRUE; } } The routine for setright is similar. Note that the routine maketree initializes the fields info and used to represent a tree with a single node. IV. BINARY TREE TRAVERSALS Traversing means visitng each node only once. Tree traversal is a method for visiting all the nodes in the tree exactly once. There are three types of tree traversal techniques, namely Inorder Traversal Preorder Traversal Postorder Traversal Inorder Traversal The Inorder traversal of a binary tree is performed as Traverse the left subtree in inorder Visit the root Traverse the right subtree in inorder Example Fig: Inorder 10, 20, 30 20 10 30 20 10 30
  64. 64. 64 Fig: Inorder A B C D E G H I J K Recursive routine for Inorder Traversal Void Inorder (Tree T) { if ( T!= NULL) { Inorder (T->left); printElement (T->Element); Inorder (T->right); }} Preorder Traversal The preorder traversal of a binary tree is performed as Visit the root Traverse the left subtree in preorder Traverse the right subtree in preorder Example D C I A B G K E H J 20 10 30
  65. 65. 65 Fig: Preorder 20, 10, 30 Fig: Preorder D C A B I G E H K J Recursive routine for Inorder Traversal Void Preorder (Tree T) { if ( T!= NULL) { printElement (T->Element); Preorder (T->left); Preorder (T->right); } } Postorder Traversal The postorder traversal of a binary tree is performed as Traverse the left subtree in postorder Traverse the right subtree in postorder Visit the root D C I A B G K E H J
  66. 66. 66 Example Fig: Postorder 10, 30, 20 Fig: Postorder B A C Education H G J K I D Recursive routine for Inorder Traversal Void Postorder (Tree T) { if ( T!= NULL) { Postorder (T->left); Postorder (T->right); printElement (T->Element); } } V. HUFFMAN ALGORITHM The inputs to the algorithm are n, the number of symbols in the original alphabet, and frequency, an array of size at least n such that frequency [i] is the re1ative frequency of the ith symbol. The algorithm assigns values to an array code of size at least n, so that code[il contains the code assigned to the ith symbol. 20 10 30 D C I A B G K E H J
  67. 67. 67 The algorithm also constructs an array position of size at least n such that position[il points to the node representing the ith symbol. This array is necessary to identify the point in the tree from which to start in constructing the code for a particular symbol in the alphabet. Once the tree has been constructed, the isleft operation introduced earlier can be used to determine whether 0 or 1 should be placed at the front of the code as we climb the tree. The info portion of a tree node contains the frequency of occurrence of the symbol represented by that node. A set root nodes is used to keep pointers to the roots of partial binary trees that are not yet left or right subtrees. Since this set is modified by removing elements with minimum frequency, combining them and then reinserting the combined element into the set, it is implemented as an ascending priority queue of pointers, ordered by the value of the info field of the pointers' target nodes. We use the operations pqinsert, to insert a pointer into the priority queue, and pqmindelete, to remove the pointer to the node with the smallest info value from the priority queue. We may outline Huffman's algorithm as follows: /* initialize the set of root nodes */ rootnodes = the empty ascending priority queue; /* construct a node for each symbol */
  68. 68. 68 Fig: Huffman trees The huffman tree is strictly binary. Thus, if there are n symbols in the alphabet, the Huffman tree can be presented by an array of nodes of size 2n-1. REPRESENTING LISTS AS BINARY TREES  In this section we introduce a tree representation of a linear list in which operations of fmding the kth element of a list and deleting a specific e1ement are relatively efficient.  It is also possible to build a list with given elements using representation. We also briefly consider the operation of inserting a single new element.  A list may be represented by a binary tree as illustrated in Fig. In Fig(a) shows a list in the usual linked format. while Fig(b) and (c) show two binary tree representations of the list.  Elements of the original list are represented by leaves of the tree (shown as squares in the figure).  Whereas non leaf node tree (shown as circles in the figure) are present as part of the internal tree structure.
  69. 69. 69  Associated with each leaf node are the contents of the corresponding list edlement. Associated with each nonleaf node is a count representing the number of leaves in the node's left subtree.  The elements of the list in their original sequence are assigned to the leaves of the tree in the inorder sequence of the leaves. Note from Fig several binary trees can represent the same list. Fig: A list and two corresponding Binary Trees Finding the kth Element  To justify using so many extra tree nodes to represent a list, we present an algorithm to find the kth element of a list represented by a tree.  Let tree point to the root of the tree, and let lcount(p) represent the count associated with the nonleaf node pointed to by p [lcount(p) is the number ofleaves in the tree rooted at node(left(p))].  The following algorithm sets the variable find to point to the leaf containing the kth element of the list. o The algorithm maintains a variable r containing the number of list elements remaining to be counted. o At the beginning of the algorithm r is initialized to k. At each nonleaf node(p), the algorithm determines from the values of rand lcount(p) whether the kth element is located in the left or right subtree. o If the leaf is in the left subtree, the algorithm proceeds directly to that subtree. If the desired leaf is in the right subtree, the algorithm proceeds to that subtree after reducing the value of r by the value of lcount(p). o k is assumed to be less than or equal to the number of elements in the list. r = k; P = tree; while (p is not a leaf node) if(r <=lcount(p))
  70. 70. 70 p = left(p); else { r -= lcount(p); p = right(p); } find = p;  Fig(a) illustrates finding the fifth element of a list in the tree of Fig(b), and Fig(b) illustrates finding the eighth element in the tree of Fig(c).  The dashed line represents the path taken by the algorithm down the tree to the appropriate leaf. We indicate the value of r (the remaining number of elements to be counted) next to each node encountered by the algorithm. The number of tree nodes examined in finding the kth list element is less than or equal to 1 more than the depth of the tree (the longest path in the tree from the root to a leaf). Thus four nodes are examined in Fig (a) in finding the fifth element of the list, and also in Fig(b) in finding the eighth element. If a list is represented as a linked structure, four nodes are accessed in finding the fifth element of the list [that is, the operation p = next(p) is performed four times] and seven nodes are accessed in finding the eighth element. Although this is not a very impressive saving, consider a list with 1000 elements. A binary tree of depth 10 is sufficient to represent such a list, since log2 1000 is less than 10. Thus, finding the kth element using such a binary tree would require examining no more than 11 nodes. Since the number of leaves of a binary tree increases as 2d, where d is the depth of the tree, such a tree represents a relatively efficient data structure for finding the kth element of a list. If an almost complete tree is used, the kth element of an n-element list can be found in at most log2n + 1 node accesses, whereas k accesses would be required if a linear linked list were used. Fig: Finding the nth element of a tree-represented list
  71. 71. 71 Deleting an Element It involves only resetting a left or right pointer in the father of the deleted leaf dl to null.Fig illustrates the results of this algorithm for a tree in which the nodes C, D, and B are deleted in that order. Make sure that you follow the actions of the algorithm on these examples. Note that the algorithm maintains a 0 count in leaf nodes for consistency, although the count is not required for such nodes. Note I!o that the algorithm never moves up a nonleaf node even if this could be done. We can easily modify the algorithm to do this but have not done so for reasons that will become apparent shortly. This deletion algorithm involves inspection of up to two nodes at each level. Thus, the operation deleting the kth element of a list represented by a tree requires a number of node accesses approximately equal to three times the tree depth. Although deletion from a linked list requires acesses to only three nodes. For large lists, therefore, the tree representation is more efficient. Fig: Deletion Algorithm TREE SEARCHING There are several ways of organizing files as trees and some associated searching algorithms. In previous, we presented a method of using a binary tree to store a file in order to make sorting the file more efficient. In that method, all the left descendants of a node with key key have keys that are less than key, and all right descendants have keys that are greater than or equal to key. The inorder such a binary tree yields the file in ascending key order. Such a tree may also be used as a binary search tree. Using binary tree noation,the algorithm for searching for the key key in such a tree is as follows p=tree; while(p!=NULL && KEY!=k(p)) p=(key< k(p)) ? left(p):right(p); return(p);
  72. 72. 72 The efficiency of the search process can be improved by using a sentinel, as in sequential searching. A sentinel node, with a separate external pointer pointing to it, remains allocated with the tree. All left or right tree pointers that do not point to another tree node now point to this sentinel node instead of equalling null. When a search is performed, the argument key is first inserted into the sentinel node, thus guaranteeing that it will be located in the tree. A sorted array can be produced from a binary search tree by traversing tree in inorder and inserting each element sequentially into the array as it is visited. On the other hand, there are many binary search trees that correspond to a given sorted array. Viewing the middle element of the array as the root of a tree and viewing the remaining elements recursively as left and right subtrees produces a relatively balanced binary search tree in Fig(a). Viewing the first element of the array as the root of a tree and each successive element as the right predecessor produces a very unbalanced binary tree in Fig (b). The advantage of using a binary search tree over an array is that a tree enables search, insertion, and deletion operations to be performed efficiently. If an array used, an insertion or deletion requires that approximately half of the elements array be moved. (Why?) Insertion or deletion in a search tree, on the other requires that only a few pointers be adjusted. Fig (a) A sorted array and two of its binary tree representations
  73. 73. 73 Fig(b) cont.. Inserting into a Binary search Tree The following algorithm searches a binary search tree and inserts a new record into the tree if the search is unsuccessful. q = null; p = tree; while (p != null) { if (key == k(p)) return (p) ; q = p; if (key < k(p)) p = left(p); else p = right(p); v = maketree(rec, key); if (q == null) tree = v; else if (key < k( q) ) left(q) = v;
  74. 74. 74 else right( q) = v; return ( v) ; Note that after a new record is inserted, the tree retains the property of being sorted in an inorder traversal Deleting from a Binary Search Tree We now present an algorithm to delete a node with key key from a binary search tree. There are three cases to consider. If the node to be deleted has no sons, may be deleted without further adjustment to the tree. This is illustrated in Fig (a). If the node to be deleted has only one subtree, its only son can be moved up to take its place. This is illustrated in Fig (b). If, however, the node p to be deleted has two subtrees. its inorder successor s (or predecessor) must take its place. The inorder successor cannot have a left subtree. Thus the right son of s can be moved up to the place of s. This is illustrated in Fig(c), where the node with key 12 Replaces the node with key 11 and is replaced, in turn by the node with the key 13. In the algorithm below, if no node with key key exists in the tree, the tree is left unchanged. Fig(a) Deleting node with key 15 Fig(b) Deleting node with key 5
  75. 75. 75 Fig(c) Deleting node with key 11 P=tree; Q=null; while (p!=null && k(p)!=key) { q=p; p=(key < k(p))? Left(p):right(p); } if( p==null) return; if (left(p)==null) rp=right(p); else if(right(p)==null) rp=left(p); else { f=p; rp=right(p); s=left(rp); while (s!=null) { f=rp; rp=s; s=left(rp); }
  76. 76. 76 if (f!=p) { left (p)=right(p); right (rp)=right(p); } left(rp)=left(p); } if (q==null) tree=rp; else (p==left(q))? ,eft(q)=rp:right(q)=rp; freenode(p); return; VI. SORTING AND SEARCHING TECHNIQUES Sorting is the operation of arranging the records of a table according to the key value of each record. A table of a file is an ordered sequence of records r[1],r[2],…r[n] each containing a key k[1],k[2]…k[n]. The table is sorted based on the key. A sorting algorithm is said to be stable if it preserves the order for all records. There are Internal Sorting External Sorting Internal Sort: All records to be sorted are kept internally in the main memory External Sort: If there are large number of records to be stored, they must be kept in external files on auxiliary storage. INTERNAL SORTING A) INSERTION SORT Insertion sort works by taking elements from the list one by one and inserting them in their current position into the new sorted list. Insertion sort consists of N-1 passes, where N is the number of elements to be sorted. The ith pass of insertion sort will insert the ith element A[i] into its right place among A[1],A[2]..A[i- 1].
  77. 77. 77 After doing this insertion the records occupying A[1]..A[i] are in sorted order. Procedure Void Insertion_Sort (int a[], int n) { int i, j,temp; for ( i=0;i<n;i++) { temp=a[i]; for (j=I;j>0 && a[j-1]> temp; j--) { a[j]=a[j-1]; } a[j]=temp; } } Example Consider an unsorted array 20 10 60 40 30 1 Passes of Insertion sort Analysis Worst Case Analysis O(N2 ) Best Case Analysis O(N) ORIGINAL 20 10 60 40 30 15 POSITIONS MOVED After i=1 10 20 60 40 30 15 1 After i=2 10 20 60 40 30 15 0 After i=3 10 20 40 60 30 15 1 After i=4 10 20 30 40 60 15 2 After i=4 10 15 20 30 40 60 4 Sorted Array 10 15 20 30 40 60
  78. 78. 78 Average Case Analysis O(N2 ) B) SHELL SORT Shell Sort was invented by Donald Shell. It improves upon bubble sort and insertion sort by moving out of order elements more than one position at a time. It works by arranging the data sequence in a two dimensional array and then sorting the columns of the array using insertion sort. In shell sort the whole array is first fragmented into K segments, where K is preferably a prime number. After the first pass the whole array is partially sorted. In the next pass, the value of K is reduced which increases the size of each argument and reduces the number of segments. The next value of K is chosen so that it is relatively prime to its previous value. The process is repeated until K=1 at which array is sorted. The insertion sort is applied to each segment, so each successive segment is partially sorted. The shell sort is also called the Diminishing Increment Sort, because the value of K decreases continuously. Procedure Void shellsort (int A[],int N) { int i ,j,k,temp; for (i=k;i<N;i++) { temp=A[i]; for( j=I;j>=k &&A[j-k]>temp;j=j-k) { A[j]=A[j-k]; } A[j]=temp; } } Example Consider an unsorted array 81 94 11 96 12 35 17 95 28 58 Here N=10, the first pass K=5(10/2)
  79. 79. 79 81 94 11 96 12 35 17 95 28 58 After first pass 35 17 11 28 12 81 94 95 96 58 In second pass, K is reduced to 3 After second pass 28 12 11 35 17 81 58 95 96 94 In third pass, K is reduced to 1 The final sorted array is 11 12 17 28 35 58 81 94 95 96 Analysis Worst Case Analysis O(N2 ) Best Case Analysis O(N log N) Average Case Analysis O(N1.5 ) C) QUICK SORT The basic version of quick sort algorithm was invented by C. A. R. Hoare in 1960 and formally introduced quick sort in 1962. It is used on the principle of divide-and-conquer. Quick sort is an algorithm of choice in many situations because it is not difficult to implement, it is a good "general purpose" sort and it consumes relatively fewer resources during execution. Good points It is in-place since it uses only a small auxiliary stack. It requires only n log(n) time to sort n items. It has an extremely short inner loop This algorithm has been subjected to a thorough mathematical analysis, a very precise statement can be made about performance issues. Bad Points It is recursive. Especially if recursion is not available, the implementation is extremely complicated. It requires quadratic (i.e., n2 ) time in the worst-case. It is fragile i.e., a simple mistake in the implementation can go unnoticed and cause it to perform badly.
  80. 80. 80 Quick sort works by partitioning a given array A[p . . r] into two non-empty sub array A[p . . q] and A[q+1 . . r] such that every key in A[p . . q] is less than or equal to every key in A[q+1 . . r]. Then the two sub arrays are sorted by recursive calls to Quick sort. The exact position of the partition depends on the given array and index q is computed as a part of the partitioning procedure. QuickSort 1. If p < r then 2. q= Partition (A, p, r) 3. Recursive call to Quick Sort (A, p, q-1) 4. Recursive call to Quick Sort (A, q + 1, r) Note that to sort entire array, the initial call Quick Sort (A, 1, length[A]) As a first step, Quick Sort chooses as pivot one of the items in the array to be sorted. Then array is then partitioned on either side of the pivot. Elements that are less than or equal to pivot will move toward the left and elements that are greater than or equal to pivot will move toward the right. Partitioning the Array Partitioning procedure rearranges the sub arrays in-place. 1. PARTITION (A, p, r) 2. x ← A[p] 3. i ← p-1 4. j ← r+1 5. while i<j do 6. Repeat j ← j-1 7. until A[j] ≤ x 8. repeat i ← i+1 9. until A[i] ≥ x 10. Exchange A[i] ↔ A[j] 11. Exchange A[p] ↔ A[j] 12. return j Partition selects the first key, A[p] as a pivot key about which the array will partitioned: Keys ≤ A[p] will be moved towards the left . Keys ≥ A[p] will be moved towards the right. The running time of the partition procedure is (n) where n = r - p +1 which is the number of keys in the array. Another argument that running time of PARTITION on a subarray of size (n) is as follows:

×