SlideShare a Scribd company logo
1 of 373
DATA STRUCTURES
Dr.Sabitha Banu
19-06-2023 Data Structures
Unit -1
• Introduction of Algorithms
• Analysing Algorithms
• Arrays: Sparse Matrices
• Representation of Arrays
• Stacks and Queues
• Fundamentals - Evaluation of Expression Infix to Postfix Conversion
• Multiple Stacks and Queues
19-06-2023 Data Structures
Introduction
• Algorithm is a step-by-step procedure, which defines a set of
instructions to be executed in a certain order to get the desired
output
• An algorithm can be implemented in more than one programming
language.(For eg.C ,C++,Python,Ruby)
• Algorithms just Data structures
• Categories of Algorithm are
Search − Algorithm to search an item in a data structure.
Sort − Algorithm to sort items in a certain order.
Insert − Algorithm to insert item in a data structure.
Update − Algorithm to update an existing item in a data structure.
Delete − Algorithm to delete an existing item from a data structure.
19-06-2023 Data Structures
Characteristics
• Unambiguous − Algorithm should be clear and unambiguous. Each of
its steps (or phases), and their inputs/outputs should be clear and
must lead to only one meaning.
• Input − An algorithm should have 0 or more well-defined inputs.
• Output − An algorithm should have 1 or more well-defined outputs,
and should match the desired output.
• Finiteness − Algorithms must terminate after a finite number of steps.
• Feasibility − Should be feasible with the available resources.
• Independent − An algorithm should have step-by-step directions,
which should be independent of any programming code.
19-06-2023 Data Structures
How to Write an Algorithm?
• step-by-step procedure
• Algorithm writing is a process and is executed after the problem
domain is well-defined.
• Example
19-06-2023 Data Structures
Advantages of Algorithms:
• It is easy to understand.
• An algorithm is a step-wise representation of a solution to a given
problem.
• In Algorithm the problem is broken down into smaller pieces or steps
hence, it is easier for the programmer to convert it into an actual
program.
Disadvantages of Algorithms:
• Writing an algorithm takes a long time so it is time-consuming.
• Understanding complex logic through algorithms can be very difficult.
• Branching and Looping statements are difficult to show in Algorithms
19-06-2023 Data Structures
Analysis of Algorithms
• Provides theoretical estimation for the required resources of an algorithm
to solve a specific computational problem.
• Analysis of algorithms is the determination of the amount of time and
space resources required to execute it.
• Efficiency(CPU, Memory ,Disk, Network )
• Time complexity
• Space complexity
• Different ways of analysis
Asymptotic Analysis
Worst, Average and Best Cases
Asymptotic Notations
Analysis of Loops
 Solving Recurrences
 Amortized Analysis
19-06-2023 Data Structures
19-06-2023 Data Structures
Asymptotic Analysis
• Performance of the algorithm based on the input size
• Relation between the running time and the input size
• Time and Space factor
Worst, Average and Best Cases
• Divided into three different cases
Best Case(Ω) − minimum time taken to execute the program.
Average Case(θ) − average time taken to execute the program.
Worst Case(O) − maximum time taken to execute the program.
Asymptotic Notations
• Asymptotic notations are mathematical tools to represent the time complexity of
algorithms for asymptotic analysis.
Ο (Big O) Notation
Ω (Omega)Notation
θ (Theta) Notation
19-06-2023 Data Structures
19-06-2023 Data Structures
Analysis of Loops
• analysis of iterative programs
O(1): Time complexity of a function (or set of statements) is considered as O(1) if it doesn’t
contain loop, recursion, and call to any other non-constant time function.
// c=a+b
print c; //
O(n): Time Complexity of a loop is considered as O(n) if the loop variables are
incremented/decremented by a constant amount.
O(nc): Time complexity of nested loops is equal to
the number of times the innermost statement is
executed.
O(Logn) Time Complexity of a loop is
considered as O(Logn) if the loop variables
are divided/multiplied by a constant
amount. And also for recursive call in
recursive function the Time Complexity is
considered as O(Logn).
O(LogLogn) Time Complexity of a loop is
considered as O(LogLogn) if the loop
variables are reduced/increased
exponentially by a constant amount.
19-06-2023 Data Structures
Time Complexity of Loops
O(1) Set of statements
O(n) incremented/decremented by a constant
amount
O(nc) Innermost statements in nested loops
executed no. of times
O(Logn) divided/multiplied by a constant amount.
O(LogLogn) reduced/increased exponentially
Solving Recurrences
• Solving recursive problems
• There are mainly three ways of solving recurrences.
Substitution Method- Making a guess for the solution and then using mathematical induction to prove
the guess is correct or incorrect.
Recurrence Tree Method- Draw a recurrence tree and calculate the time taken by every level of the tree.
Finally, sum the work done at all levels. Eg.Divide and Conquer method
Master Method- Master Method is a direct way to get the solution.
Amortized Analysis
• is used for algorithms where an occasional operation is very slow, but most of the other
operations are faster.
19-06-2023 Data Structures
19-06-2023 Data Structures
Basics of Data structure
• Structuring/organizing the Data in a computer so that it can be used effectively
• Data must be atomic, traceable, accurate ,clear and concise.
• Data type
• Basic Operations
 Traverse
 Search
 Insert
 Delete
 Sort
 Merge
 Create
 Retrieve
 Store
19-06-2023 Data Structures
Built-in Data Type Derived Data Type
•Integers
•Boolean (true, false)
•Floating (Decimal numbers)
•Character and Strings
• List
• Array
• Stack
• Queue
Arrays
• fixed-size sequenced collection of variables belonging to the same data types
and stored in contiguous memory.
• Set of pairs, index or value
• The array has adjacent memory locations to store values.
• convenient structure for representing data
• Two terms to understand the concept of array are Element and Index
Element − Each item stored in an array is called an element.
Index − Each location of an element in an array has a numerical index, which is used to
identify the element.
data_type array_name [array_size];
19-06-2023 Data Structures
• Index starts with 0.
• Array length is 10 which means it can store 10 elements.
• Each element can be accessed via its index(mapping). For example, we can fetch an
element at index 6 as 9.
#
structure ARRAY(value, index)
declare CREATE( ) array
RETRIEVE(array,index) value
STORE(array,index,value) array;
#
Need for Arrays
• number of variables used will increase
19-06-2023 Data Structures
Ordered Lists
• list in which the elements must always be ordered in a particular way
• Also called as Sorted list.
Eg. (SUNDAY ,MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY,
SATURDAY)
Representation of arrays
One dimensional array
A one-dimensional array is also called a single dimensional array where the elements will be accessed in
sequential order. This type of array will be accessed by the subscript of either a column or row index. eg
a[n] or an
Two dimensional array
 When the number of dimensions specified is more than one, then it is called as a multi-
dimensional array. Eg a[3,3] (row x column)
19-06-2023 Data Structures
Eg a[3][4]
• A two-dimensional array will be accessed by using the subscript of row and column
index.
eg a[1][1]
19-06-2023 Data Structures
 Three dimensional array
 In a three-dimensional array, there will be three dimensions. For eg.a[2][3][4]
#include <stdio.h>
int main()
{
int one_dim [10]; # declaration of 1D array
int two_dim [2][2]; #declaration of 2D array
int three_dim [2][3][4] =
{ { {3, 4, 2, 3}, {0, -3, 9, 11}, {23, 12, 23, 2} },
{ {13, 4, 56, 3}, {5, 9, 3, 5}, {3, 1, 4, 9}
};
return 0;
}
19-06-2023 Data Structures
Sparse Matrices
• Triplet/Array representation
• Linked List representation
• Transpose
19-06-2023 Data Structures
Sparse Matrices
• a matrix will be a sparse matrix if most of the elements of it is 0 (or)
• 1/3 of the matrix are non zero elements(30%)
• It will take larger space in memory with no purpose.
• To avoid wastage of space the sparse matrix is stored in
a table structure
19-06-2023 Data Structures
Row Column Value
1 4 12
1 6 -14
2 2 7
2 3 3
3 4 -8
5 1 91
6 3 25
Triplets
6x6=36
7x3=21
17 memory
locations are saved
Triplet Representation of Sparse
matrix
Linked list representation
• The complexity of inserting or deleting a node in a linked list is lesser than the
array
• The four fields of the linked list are given as follows -
Row - It represents the index of the row where the non-zero element is located.
Column - It represents the index of the column where the non-zero element is located.
Value - It is the value of the non-zero element that is located at the index (row,
column).
Next node - It stores the address of the next node.
19-06-2023 Data Structures
• For eg.
• linked list representation of the above matrix
19-06-2023 Data Structures
Transpose
• Interchanging row and column
19-06-2023 Data Structures
Row Column Value
0 2 1
1 0 3
2 1 4
3 1 6
Triplet
T=
Row Column Value
2 0 1
0 1 3
1 2 4
1 3 6
Benefits of using the sparse matrix
• Storage and
• Computing time
19-06-2023 Data Structures
Stacks and Queues
Stacks
• Abstract Data Type (ADT)
• stack allows operations(insertion or deletion) at one end only
called TOP.
• Insertion and Deletion of an element is done by 2 operations
• PUSH (store)
• POP(accessing)
• At any given time, accessing the top element of a stack
• element which is placed (inserted or added) last, is accessed first so it is also called as
LIFO(LAST IN FIRST OUT)
• The stack is called empty or null when the elements =0
• S=(a1,a2,a3,…….,an) where a1 is the bottom most element and an is the top most element
19-06-2023 Data Structures
• Status of stack can be known through the below operations
peek() − get the top data element of the stack, without removing it.
 isEmpty() − check if stack is empty.
isFull() − check if stack is full.
19-06-2023 Data Structures
• Push Operation
• The process of putting a new data element onto stack is known as a Push
Operation. Push operation involves a series of steps −
I. Step 1 − Checks if the stack is full.
II. Step 2 − If the stack is full, produces an error and exit.
III. Step 3 − If the stack is not full, increments top to point next empty space.
IV. Step 4 − Adds data element to the stack location, where top is pointing.
V. Step 5 − Returns success.
19-06-2023 Data Structures
Pop Operation
• Accessing the content while removing it from the stack
• The data element is not actually removed, instead top is decremented to a
lower position in the stack to point to the next value.
• Deallocates memory space.
• A Pop operation may involve the following steps −
I. Step 1 − Checks if the stack is empty.
II. Step 2 − If the stack is empty, produces an error and exit.
III. Step 3 − If the stack is not empty, accesses the data element at which top is pointing.
IV. Step 4 − Decreases the value of top by 1.
V. Step 5 − Returns success.
19-06-2023 Data Structures
19-06-2023 Data Structures
structure STACK (item)
1 declare CREATE ( )-> stack
2 ADD (item, stack) -> stack
3 DELETE (stack) -> stack
4 TOP (stack) -> item
5 ISEMTS (stack) -> boolean;
Queues
• Similar to stacks
• a queue has two ends and it is open at both of its ends
• Insertions (enqueue/rear) are made at one end and deletions(dequeue
/front) are made at the other end
• For eg Q= {a1,a2,…..,an} rear
• First-In-First-Out methodology, i.e., the data item stored first will be accessed
first.
19-06-2023 Data Structures
Front
• Scheduling of jobs in among computer applications
• The basic operations associated with queues −
enqueue() − add (store) an item to the queue.
dequeue() − remove (access) an item from the queue.
Enqueue Operation (Insertion/Rear)
• Queues maintain two data pointers, front and rear. Therefore, its operations
are comparatively difficult to implement than that of stacks.
• The following steps should be taken to enqueue (insert) data into a queue −
I. Step 1 − Check if the queue is full.
II. Step 2 − If the queue is full, produce overflow error and exit.
III. Step 3 − If the queue is not full, increment rear pointer to point the next empty
space.
IV. Step 4 − Add data element to the queue location, where the rear is pointing.
V. Step 5 − return success.
19-06-2023 Data Structures
19-06-2023 Data Structures
Dequeue Operation(Deletion/Front)
• Accessing data from the queue is a process of two tasks − access the data
where front is pointing and remove the data after access.
• The following steps are taken to perform dequeue operation −
I. Step 1 − Check if the queue is empty.
II. Step 2 − If the queue is empty, produce underflow error and exit.
III. Step 3 − If the queue is not empty, access the data where front is pointing.
IV. Step 4 − Increment front pointer to point to the next available data element.
V. Step 5 − Return success.
19-06-2023 Data Structures
19-06-2023 Data Structures
• Few more functions are
peek() − Gets the element at the front of the queue without removing it.
isfull() − Checks if the queue is full.
isempty() − Checks if the queue is empty.
• peek() -This function helps to see the data at the front of the queue.
19-06-2023 Data Structures
• isfull() -check for the rear pointer to reach at MAXSIZE to determine that the
queue is full
• isempty()- If the value of front is less than MIN or 0, it tells that the queue is
not yet initialized, hence empty.
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
Multiple stacks and Queues
• A single stack is sometimes not sufficient to store a large amount of data.
• To overcome this problem, multiple stack solves the problem.
• A single array having more than one stack. The array is divided for multiple
stacks.
• m memory is divided in to n number of stacks sharing equal memory.
• If size of stack is known then the m memory can divided in to known number
of stacks
19-06-2023 Data Structures
19-06-2023 Data Structures
T[i]
B[i]
B[i]=T[i] # if ith stack is empty/underflow
B[i]=T[i+1] # ith stack is full/overflow
Evaluation of Expressions
Expression - An expression is a collection of operators and operands that
represents a specific value.
For eg
• operator is a symbol which performs a particular task like arithmetic operation
or logical operation or conditional operation etc.,
Operands are the values on which the operators can perform the task. Here
operand can be a direct value or variable or address of memory location.
19-06-2023 Data Structures
• Three different types of Expressions based on the operator position are
Infix Expression-operator placed between the operands eg.a+b
Postfix Expression- operator is used after operands eg ab+
Prefix Expression- operator is used before operands eg. +ab
• convert an expression from one form to another form like Infix to Postfix,
Infix to Prefix, Prefix to Postfix and vice versa.
• Converting any Infix expression into Postfix or Prefix expression
Find all the operators in the given Infix Expression.
Find the order of operators evaluated according to their Operator precedence.
Convert each operator into required type of expression (Postfix or Prefix) in the same
order
19-06-2023 Data Structures
Steps to convert Infix Expression to Postfix Expression...
D = A + B * C
Step 1 - The Operators in the given Infix Expression : = , + , *
Step 2 - The Order of Operators according to their preference : * , + , =
Step 3 - Now, convert the first operator * ----- D = A + B C *
Step 4 - Convert the next operator + ----- D = A BC* +
Step 5 - Convert the next operator = ----- D ABC*+ =
19-06-2023 Data Structures
Operator Priority
**,unary-,unary+,¬ 7
^(exponentiation) 6
*,/ 5
+,- 4
<,>,=,≠,≤,≥, 3
and 2
or 1
19-06-2023 Data Structures
Unit -2 Linked List
• Linked List: Singly Linked List
• Linked Stacks and Queues
• Polynomial Addition
• More on Linked Lists
• Sparse Matrices
• Doubly Linked List and Dynamic
• Storage Management
• Garbage Collection and Compaction.
19-06-2023 Data Structures
Linked Lists
• A linked list is a linear data structure, in which the elements are not stored at
contiguous memory locations.
• The elements in a linked list are linked using pointers.
• A linked list consists of nodes where each node contains a data field and a
reference(link) to the next node in the list.
• Address of the first/starting node is identified head and last node is identified as NULL .
• A linked list can grow and shrink its size, as per the requirement.
• It does not waste memory space.
19-06-2023 Data Structures
Node
• Different types of Linked lists are
Singly linked list-Item navigation is forward only.
19-06-2023 Data Structures
Doubly linked list-Items can be navigated forward and backward
19-06-2023 Data Structures
Circular linked list-Last item contains link of the first element as
next and the first element has a link to the last element as
previous.
19-06-2023 Data Structures
• Basic Operations of LL are
Insert − Adds an node to the list.
Display − Displays the complete list.
Search − Searches an element using the given key.
Delete − Deletes an element using the given key.
• Insert- Adding a new node in linked list
19-06-2023 Data Structures
NewNode.next −>
RightNode;
LeftNode.next −>
NewNode
19-06-2023 Data Structures
• Khg
19-06-2023 Data Structures
19-06-2023 Data Structures
GAT
1.Get a node which is currently unused and address it
as X
2.Set the DATA field of this node to GAT
3.Set the LINK field of X to point to the node after FAT
which contains HAT
4.Set the LINK field of the node containing FAT to X
Deletion-
• locate the target node to be removed, by using searching algorithms.
19-06-2023 Data Structures
TargetNode.next −>
NULL;
• Either it deletes the node from the linkedlist or deallocate its
memory and wipe off completely.
• Suppose to delete the node GAT from the list
19-06-2023 Data Structures
1
• Dividing memory into nodes each having at least one link field.
• A mechanism to determine the nodes which are free and in use
• A mechanism to transfer nodes from the reserved pool to the free pool and
vice versa
Storage pool
• Contains all nodes that are not currently being used.
• RET(to the pool) and GETNODE(from the pool) procedures
• If the node is no longer needed it is erased from the pool.
• Initially link all of the available nodes together in a single list-AV
• Singly linked list where available nodes are linked.
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
Example 1. Assume that each node has two fields DATA and LINK. The following
algorithm creates a linked list with two nodes whose DATA fields are set to be
the values 'MAT' and 'PAT' respectively. T is a pointer to the first node in this list.
19-06-2023 Data Structures
Eg 2-Let T be a pointer to a linked list. T= 0 if the list has no nodes. Let X be a
pointer to some arbitrary node in the list T. The following algorithm inserts a
node with DATA field 'OAT' following the node pointed at by X.
19-06-2023 Data Structures
Eg 3-Let X be a pointer to some node in a linked list T . Let Y be the node
preceding X. Y = 0 if X is the first node in T (i.e., if X = T). The following
algorithm deletes node X from T.
19-06-2023 Data Structures
Array vs Linkedlist
Array Linked list
An array is a collection of elements of a similar data type.
A linked list is a collection of objects known as a node where
node consists of two parts, i.e., data and address.
Array elements store in a contiguous memory location.
Linked list elements can be stored anywhere in the memory or
randomly stored.
Array works with a static memory. Here static memory means
that the memory size is fixed and cannot be changed at the run
time.
The Linked list works with dynamic memory. Here, dynamic
memory means that the memory size can be changed at the
run time according to our requirements.
Array elements are independent of each other.
Linked list elements are dependent on each other. As each
node contains the address of the next node so to access the
next node, we need to access its previous node.
Array takes more time while performing any operation like
insertion, deletion, etc.
Linked list takes less time while performing any operation like
insertion, deletion, etc.
Accessing any element in an array is faster as the element in an
array can be directly accessed through the index.
Accessing an element in a linked list is slower as it starts
traversing from the first element of the linked list.
In the case of an array, memory is allocated at compile-time. In the case of a linked list, memory is allocated at run time.
Memory utilization is inefficient in the array. For example, if the
size of the array is 6, and array consists of 3 elements only then
the rest of the space will be unused.
Memory utilization is efficient in the case of a linked list as the
memory can be allocated or deallocated at the run time
according to our requirement.
19-06-2023 Data Structures
Polynomial addition
• polynomials are the expressions that contain the number of terms with non-
zero exponents and coefficients.
• Consider the following General Represent of Polynomial.
• Linked representation of polynomials, each term considered as a node,
therefore these node contains three fields.
• Coefficient Field – The coefficient field holds the value of the coefficient of a term
• Exponent Field – The Exponent field contains the exponent value of the term
• Link Field – The linked field contains the address of the next term in the polynomial
19-06-2023 Data Structures
• let us consider P and Q be two polynomials having these two polynomials
three terms each.
A=3𝑥14+2𝑥8+1
B=8𝑥14-3𝑥10+10𝑥6
• The two plynomials are represented in the form of linked list below
A=3𝑥14+2𝑥8+1 B=8𝑥14-3𝑥10+10𝑥6
19-06-2023 Data Structures
• The following algorithm computes time and cost for the below operations
• Coefficient additions
• Coefficient comparisons
• Additions/deletions on available space
• Creating new node for C
19-06-2023 Data Structures
19-06-2023 Data Structures
• ATTACH procedure creates a new node with C(coefficient),E(exponent),d
(current last node)
• Whenever new node is generated with C ,E it is appended to the end of the
list C
19-06-2023 Data Structures
19-06-2023 Data Structures
Polynomial Addition
• https://www.youtube.com/watch?v=cFHZ-a87Vp4
19-06-2023 Data Structures
• The use of linked lists is well suited for all polynomial operations like
addition,subtraction,multiplication by writing procedures collecting input,
and displaying output.
• For eg
D(x)=A(x)*B(x)+C(x)
Can be written as
19-06-2023 Data Structures
• To compute more polynomial operations the nodes T(x) are reclaimed to
hold other polynomials for the future use.
19-06-2023 Data Structures
• RET procedure is avoided by using ERASE procedure
• The time take to erase T(x) proportional to the number of nodes in T.
• Another efficient way to erase the nodes is by modifying the list structure
(link field of the last node points back to the first node )
• Circular list erases the nodes in fixed amount of time independent of the
number of nodes in the list.
19-06-2023 Data Structures
• CERASE(T)
19-06-2023 Data Structures
• Zero/Non zero polynomials are handled in a special case
• One special node is added for handling zero polynomials
• A=3x14+2x8+1
19-06-2023 Data Structures
19-06-2023 Data Structures
• Invert linked list
• https://www.youtube.com/watch?v=sYcOK51hl-A
• https://www.youtube.com/watch?v=D7y_hoT_YZI
19-06-2023 Data Structures
CONCATENATE Procedure
• Concatenates subroutines two chains X and Y .It is linear .
• Concatenation means joining two linked lists or appending one linked list to
another linked list and generate a combined linked list.
• Time Complexity of Concatenate procedure is O(n).
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
INSERT_FRONT procedure
• Inserts a node at the front or rear of a circular list and take a fixed amount of
time.
19-06-2023 Data Structures
LENGTH Procedure
• To find a length of a list
19-06-2023 Data Structures
19-06-2023 Data Structures
SPARSE MATRIX linked list representation-
• Each column of a sparse matrix will be represented by a circularly linked list
with a HEAD node.
• Each row will also be a circularly linked list with a head node .
• Each node in the structure other than a head node will represent a non zero
term in the matrix A.
• Linked list representation of Sparse matrix has 5 fields.
19-06-2023 Data Structures
Down-links to the
next non zero
element in the same
column
Right-links to the next
non zero element in
the same row.
• a ij will be linked into the circular linked list for row i and circular linked list of
column j.
• So aij be a member of two lists at the same time.
• Every row and column has head nodes and it is set to zero.
• For every non zero term of Matrix A ,one 5 field node is given.
19-06-2023 Data Structures
19-06-2023 Data Structures
• MREAD and MERASE procedure is used to read and erase the elements of the
sparse matrix linked list representation.
19-06-2023 Data Structures
19-06-2023 Data Structures
Doubly linked list-
• A node in a DLL has 3 fields DATA,LLINK,RLINK
• May or may not be circular
• DATA field of the head node will not contain information.
19-06-2023 Data Structures
• If P node points to any node in the doubly linked list
19-06-2023 Data Structures
19-06-2023 Data Structures
Dynamic storage Management-
• In a multiprocessor system several programs reside in memory at the same time.
• Different programs have different memory requirements.
• When OS requests for memory in dynamic environment memory size is not known
ahead of time.
• After the execution of the program the memory is freed is some order different
from allocation.
• At the start of the computer system whole memory with no jobs are available for
allocation.
• Then jobs are submitted to the computer and requests for memory allocation.
19-06-2023 Data Structures
• For eg start with 1,00,000 words of memory and 5 programs
• Unshaded area indicates memory that is not currently in use.
• Assume P2 and P4 complete execution freeing the memory used by them.
19-06-2023 Data Structures
Memory Programs
10,000 P1
15,000 P2
6,000 P3
8,000 P4
20,000 P5
41,000
• OS has to maintain a list of all blocks of storage currently not in use and then
to allocate storage from this unused pool as required .
• Chain structure is adopted to maintain the available space list.
• Linking all the free blocks together retaining the memory size of the block.
• Each node on the free list has 2 fields in its first word SIZE and LINK.
19-06-2023 Data Structures
Memory Programs
10,000 P1
15,000 P2
6,000 P3
8,000 P4
20,000 P5
• During requisition for the memory of storing N words in the list of free blocks
finding or searching the necessary free block is done by allocation strategy.
• Allocation strategy is of two types
First fit
Best fit
• If the memory block size ≥ N and allocating N words out this
block-First fit
• If the memory whose size is as close to N as possible and not less
than N-Best Fit
19-06-2023 Data Structures
19-06-2023 Data Structures
n- memory size
needed
p- address where n
can be allocated
AV-available space list
• Allocation for a portion of memory in a free block is made from the bottom
of the block to avoid changing links in the available list.
• The blocks in the available list is maintained as a circular linked list with head
node set to 0.
• Allocation and freeing of nodes is made here .
• Freeing nodes or returning nodes to AV and recognize if its neighbours are
also free so that they can be coalesced in to single block.
19-06-2023 Data Structures
Memory Programs
10,000 P1
15,000 P2
6,000 P3
8,000 P4
20,000 P5
19-06-2023 Data Structures
If P3 is the next program to terminate rather than adding
it to the free list ,it is better to combine the adjacent free
blocks corresponding to P2 and P4
Memory Program
s
10,000 P1
15,000 P2
6,000 P3
8,000 P4
20,000 P5
• When are free blocks are combined together available block sizes get
smaller and smaller.
• To determine free adjacent memory blocks without searching the available
list ,a node structure is adopted for allocated and free nodes.
19-06-2023 Data Structures
• Assume memory of size 5000 from which the following allocations are made
19-06-2023 Data Structures
Resource size
R1 300
R2 600
R3 900
R4 700
R5 1500
R6 1000
Memory Configuration-
Different blocks of storage and the available space list-
• When a portion of free block is allocated ,allocation is made from the
bottom of the block.
• When r1 is freed
19-06-2023 Data Structures
• When r4 is freed
• When r3 is freed
19-06-2023 Data Structures
19-06-2023 Data Structures
When r5 is freed
Garbage Collection and Compaction
• The process of collecting all unused nodes and returning them to available space.
• Carried out in two phases
• First phase-marking phase-all nodes in use are marked.
• Second Phase-all unmarked nodes are returned to available space list. It is trivial when all nodes
are fixed size. Examining every unmarked nodes to check whether it is marked or unmarked. Take
O(n) steps. free nodes form a contiguous block of memory called memory compaction
• Each node contains Mark bit and it can be changed at any time by using marking
algorithm
• Marking algorithm marks all direct and indirect accessible nodes .
• Initially all the nodes are set to zero.
19-06-2023 Data Structures
• Each node will have MARK and TAG field .
• The node with MARK field as 1 contains DLINK
And RLINK.
• The TAG bit will be zero it contains atomic
Information and are called atomic nodes.
• Other nodes which contains 1 bit are called list
Nodes.
• Marking algorithms is used to mark the nodes
• Initially all the nodes are unmarked MARK(i)=0 for all nodes i
• Driver for marking algorithm is called to mark the nodes accessible from the
pointer variables .
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
Storage Compaction
• When storage requests may be for blocks of varying sizes ,compact storage so that the free
storage forms one contiguous block.
• Nodes in use have MARK bit =1 and free have MARK bit=0
• Nodes are labelled 1 to 8.
• Free nodes can be linked together to obtain the available space.moving current in use
nodes to the one end and free nodes are moved to the other
19-06-2023 Data Structures
• By relocating the storage of nodes forms two contiguous block one is for used
and another one is free
•
• Storage compaction should update the links to point to the relocated address
of the respective node .
19-06-2023 Data Structures
• With storage compaction three tasks are identified:
• Determine new addresses for nodes in use
• Update all links in nodes in use
• Relocate nodes to new addresses
19-06-2023 Data Structures
• Each node has size ,NEW_ADDR,LINK1 and LINK2
19-06-2023 Data Structures
19-06-2023 Data Structures
Trees
• Basic Terminology
• Binary Trees
• Binary Tree Representations
• Binary Trees Traversal
• More on Binary Trees
• Threaded Binary Trees
• Representation of Binary Trees
• Counting Binary Trees
19-06-2023 Data Structures
Trees
• A tree is a non linear data structure means that the data is organized so that
items of information are related by branches.
• It is easier and quick to access
• Data is organised in the form of trees with root node, branches and leaf nodes
• Also called as genealogies. There are two different types of genealogical charts
• Pedigree chart( tree of organisms
or genes)
• lineal chart( tree of languages)
19-06-2023 Data Structures
19-06-2023 Data Structures
Recursive definition of tree-A tree consists of a root, and zero or more subtrees
T1, T2, … , Tk such that there is an edge from the root of the tree to the root of
each subtree.
19-06-2023 Data Structures
• A node stands for the item of information plus the branches to other items.
• The number of subtrees of a node is called its degree.
• Nodes that have degree zero are called leaf or terminal nodes.
• The other nodes which has degree is called non terminal nodes.
• Trees nodes can also be referred as parent and child nodes.
• c
19-06-2023 Data Structures
• Children of the same parent are called
siblings
• The degree of a tree is the maximum
degree of the nodes in the tree.
• The ancestors of a node are all the
nodes along the path from the root to
that node.
• The level of the node letting the root be
at the level one.
• The height or depth of the tree
depends on the maximum level of any
node in the tree
• A forest is a set of n≥0 disjoint trees .
• A tree is called a forest when the root of the tree is removed
• We have 3 trees if node A is removed
19-06-2023 Data Structures
• Another useful way to draw a tree is using list
• The example of the tree can be written in the list form as
• The node structure of tree when represented in the form of linked list
19-06-2023 Data Structures
19-06-2023 Data Structures
Binary trees
19-06-2023 Data Structures
19-06-2023 Data Structures
Tree Binary tree
General tree is a tree in which each node can have many
children or nodes.
Whereas in binary tree, each node can have at most two
nodes.
The subtree of a general tree do not hold the ordered
property.
While the subtree of binary tree hold the ordered
property.
In data structure, a general tree can not be empty. While it can be empty.
In general tree, a node can have at most n(number of
child nodes) nodes.
While in binary tree, a node can have at most 2(number
of child nodes) nodes.
In general tree, there is no limitation on the degree of a
node.
While in binary tree, there is limitation on the degree of
a node because the nodes in a binary tree can’t have
more than two child node.
In general tree, there is either zero subtree or many
subtree.
While in binary tree, there are mainly two subtree: Left-
subtree and Right-subtree
19-06-2023 Data Structures
• S
Skewed Binary tree Complete Binary Tree
• Degree ,level , height ,leaf ,parent , and child are also applied here.
• https://www.javatpoint.com/discrete-mathematics-binary-trees
• https://www.geeksforgeeks.org/introduction-to-binary-tree-data-
structure-and-algorithm-tutorials/
19-06-2023 Data Structures
Binary tree Representation
• Binary tree is represented in the form of its depth k have 2k-1 nodes
• Sequential representation of binary tree is represented from sequentially
numbering the nodes starting from the node 1 in the level 1
• Nodes on any level are numbered from left to right
• A binary tree with n nodes and depth is complete if the nodes corresponds to
the node which are numbered one to n in the full binary tree of depth k.
19-06-2023 Data Structures
19-06-2023 Data Structures
• Array representation of sequential tree does not waste space.
• Insertion or deletion of a node in the middle of tree requires movement of
many nodes to reflect the change of level number of these nodes.
• It can be overcome easily by using linked list representation
19-06-2023 Data Structures
19-06-2023 Data Structures
• It is difficult to determine the parent node
• So a fourth field is included to identify PARENT node
Binary Tree Traversal
• Many operations can be performed on trees.
• Traversing a tree or visiting each node at least once.
• Full traversal of a tree produces a linear order for the information in a tree.
• While traversal every node is treated in the same manner
19-06-2023 Data Structures
• Six possible combinations of traversal are
LDR
LRD
DLR
DRL
RDL
RLD
• Traversal from left have 3 traversals like
LDR
LRD
DLR
• These traversals are called
Inorder
Postorder
Preorder
https://www.youtube.com/watch?v=WLvU5EQVZqY
19-06-2023 Data Structures
19-06-2023 Data Structures
Inorder –moving down the tree towards the left until no
nodes left then visit the next node on the right and move on
19-06-2023 Data Structures
19-06-2023 Data Structures
COPY of a binary tree
• Producing an exact copy or clone or duplicate of a given binary tree
• Modification of post order traversal gives the copy of the binary tree
19-06-2023 Data Structures
EQUAL of a binary tree(identical/same)
• Binary trees are equivalent if they have the same topology and the
information in corresponding nodes is identical
• By the same topology every branch in one tree corresponds to a branch in
the second in the same order
• EQUAL traverses the binary trees in preorder
19-06-2023 Data Structures
Algorithm to check binary trees are identical
• Check both nodes of both tree1 and tree2
• If tree1 and tree2 is null, tree traversal completed successfully.
• return true
• If node of any of tree is null.
• Trees are not identical, return false .
• Compare data of tree1 and tree2
• Data is same for both nodes
• Go through Left subtree and right subtree
• Traverse Left child of binary tree1 and left child of tree2
• Traverse Right child of binary tree1 and right child of tree2
• Data is not same
• Trees are not identical, return false
• After above traversal, we will know whether binary trees are identical or equal or
same.
• Time Complexity:
• Let tree1 contains p number of nodes & tree2 contains q number of nodes.
• Time Complexity: O(p) where p > q
19-06-2023 Data Structures
Example 1: Identical or Same binary trees
• Structure of both the trees is same
• Data nodes of corresponding binary trees are same.
19-06-2023 Data Structures
Example 2: Non-Identical binary trees
• Structure of both the trees is same
• Data nodes of corresponding binary trees are NOT same.
• Node C and Node R has different values.
• Node D and Node S has different values.
19-06-2023 Data Structures
19-06-2023 Data Structures
Propositional logic of a binary tree
• Propositional formula contains variables x1,x2,x3….
• And operators
• The variable with these operators are called expressions which have only 2
possible values either TRUE or FALSE
• the expression with operators is called propositional calculus
• For eg
• Can be read as
19-06-2023 Data Structures
• If x1 and x3 are false and x2 is true then the value of the above expression is
• For eg
19-06-2023 Data Structures
Threaded Binary Tree(TBT)
• Linked list representation of binary tree contains more null links than actual
pointers
• Like n+1 null links and 2n total links
• TBT is a technique to make use of null links in a clever way founded by
A.J.Perlis and C.Thornton
• Their idea was to replace the null links by pointers called threads to other
nodes.
• Rules to be followed for a thread binary tree
• Left most and right most node in the binary should be “NULL”
• Change all other null pointers to
• Left pointer-inorder predecessor
• Right pointer-inorder successor
19-06-2023 Data Structures
19-06-2023 Data Structures
H,d,i,b,e,a,f,c,g-inorder traversal
• Tree has 9 nodes and 10 NULL links
• These NULL links must be replaced by threads
Left pointer-inorder predecessor
Right pointer-inorder successor
19-06-2023 Data Structures
NULL
NULL
• In memory representation normal pointers and threads must be
differentiated
• So it can be done by mentioning the address as either parent or child
• (child(1) and parent node(0))
• Differentiated by using two extra one bit fields bits LBIT and RBIT
• Node structure of a linked binary tree with LBIT and RBIT is
• If the left pointer points to the child node LBIT will be 1 and 0 if it points the
parent or ancestor node
• If Right pointer points to child RBIT will be 1 and 0 if it points to ancestors
• https://www.youtube.com/watch?v=ffgg_zmbaxw
19-06-2023 Data Structures
Left pointer LBIT Data RBIT Right
pointer
19-06-2023 Data Structures
Left pointer LBIT Data RBIT Right
pointer
NULL
NULL
• Introducing a new dummy node and the NULL left pointer of H node points
to the left pointer of dummy node and right pointer of dummy node points
itself
• To maintain consistency of the TBT.
19-06-2023 Data Structures
• The computing time is O(n) for n nodes.
• Same can be applied to pre and postorder traversal
• Insertion is possible in threaded binary tree
• Procedure to grow a threaded tree
• If the node has an empty sub tree it is easy to insert another node otherwise
right subtree is made to right subtree of already available node.
19-06-2023 Data Structures
19-06-2023 Data Structures
Binary tree representation of trees
• Every tree can be represented as binary tree
• Array representation
• Linkedlist representation
• Relationship representation
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
Relationship representation-
• Relationship between the nodes are characterized by two quantities
• Leftmost-child-next-right-sibling relationship
• Every node has at most leftmost child and one next right sibling
• Left most child of B is E and next right sibling
Of B is C
19-06-2023 Data Structures
19-06-2023 Data Structures
• Connecting together all
siblings of a node
• Deleting all links from a
node to its children except
the link of its left most child
19-06-2023 Data Structures
19-06-2023 Data Structures
Tree can represented in the formal way as
Preorder,inorder and post order traversal of the binary tree can also be applied
here
Preorder-
Inorder traversal of T
Post-order traversal of T
https://prod-edxapp.edx-
cdn.org/assets/courseware/v1/0f0865e1fe974ec8b2244cdcd7f5d68a/c4x/Pe
kingX/04830050x/asset/chapter6_001_en.pdf
19-06-2023 Data Structures
Counting Binary Trees-
• Determining distinct binary trees with n nodes
• When n=0 and n=1 there is only one binary tree
• When n=2 ,two distinct binary trees
• When n=3, five distinct binary trees
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
• Pre order
• In order
19-06-2023 Data Structures
19-06-2023 Data Structures
GRAPHS
• A graph consists of two sets V and E
• Vertices (V)-units of a graph
• Edges(E)-connection of units
• A Graph is a non-linear data structure consisting of
vertices and edges.
• V is a finite non empty set of vertices or units of the graph
• E is a set of pairs of vertices called edges
• V(G) and E(G) represents the vertices and Edges
of Graph G
• A graph is represented as G=(V,E)
19-06-2023 Data Structures
• A graph is of two types
Directed graph
Un directed graph
19-06-2023 Data Structures
• Multigraph-A graph is said to be a multigraph if the graph doesn't consist of
any self-loops, but parallel edges are present in the graph. If there is more
than one edge present between two vertices, then that pair of vertices is
said to be having parallel edges.
19-06-2023 Data Structures
• Complete Graph-
A graph is said to be a complete graph if, for all the vertices of the graph, there exists an
edge between every pair of the vertices.
19-06-2023 Data Structures
• Adjacent-Two node or vertices are adjacent if they are connected to each
other through an edge. The adjacent vertices to vertex 2 are 4,5, and 1
19-06-2023 Data Structures
• Subgraph-A graph in data structure is said to be a subgraph if it is a part of
another graph.
19-06-2023 Data Structures
19-06-2023 Data Structures
• Length- the length of a path is the number of edges on it .
• Simple Path-A path that does not repeat vertices is called a simple path.
• Cycle-is a simple path in which
the first and last vertices are the same.
19-06-2023 Data Structures
• In degree-In-degree of a vertex is the number of edges coming to the
vertex.
• Out degree -Out-degree of a vertex is the number edges which are coming
out from the vertex.
19-06-2023 Data Structures
Graph Representation-
• There are three representations of graphs
Adjacency Matrix
Adjacency List
Adjacency multilists
Adjacency Matrix-
• Let g=(V,E) be a graph with n vertices,n>=1
• The adjacency matrix of G is a 2 dimensional n x n array say A, with the
property that A(i,j)=1 if the edge (vi,vj)is in E(G).
• A(i,j)=0 if there is no edge in G
19-06-2023 Data Structures
• The adjacency matrix for graphs G1,G3 and G4 are given below
19-06-2023 Data Structures
19-06-2023 Data Structures
The adjacency matrix will require atleast O(n2) time to examine all the diagonals
Adjacency Lists-
• N rows of adjacency matrix is represented as n linked lists.
• There is one list for each vertex in G
• Each node has atleast 2 fields
• VERTEX-contains the indices of the vertices adjacent to vertex i.
• LINK
• Each list has a head node
• The head nodes are sequential providing easy random access to the list for
any vertex
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
• Adjacency list requires n head nodes and 2e list nodes
• In terms of number of bits of storage needed this count should be
multiplied by log n for the head nodes and log n +log e for the list nodes
• It takes O(logm) bits to represent the number of value m.
• Sparse matrix representation of graph has 4 fields
19-06-2023 Data Structures
19-06-2023 Data Structures
Adjacency multilists-
• are an edge, rather than vertex based, graph representation.
• In the Multilist representation of graph structures consists of two parts
 a directory of Node information and
a set of linked list of edge information.
• For each edge there will be an exactly one node,but this node will be in two
lists
19-06-2023 Data Structures
m- one bit mark field to indicate that edge is examined
or not
V1-start vertex of edge (v1,v2)=v1
V2-start vertex of edge (v1,v2)=v2
List1-first down “list name” where v1 is present
List 2-First down “list name “ where v2 is present .
19-06-2023 Data Structures
Traversals, Connected Components ,and Spanning Trees
• Given an undirected graph g=(V,E) and a vertex v in V(g)
• Visiting all the vertices in G that are reachable from V
• Two ways to visit
• Depth first search(DFS)
• Breadth First Search(BFS)
19-06-2023 Data Structures
Depth First Search (DFS) Traversal /Algorithm-
• The start vertex v is visited
• Next an unvisited vertex w adjacent to v is selected
• A depth first search from w is initiated
• When a vertex u is reached such that all its adjacent vertices have been
visited .
• The search is terminated when no unvisited vertex can be reached from any
of the visited nodes
• The DFS algorithm is a recursive algorithm that uses the idea of backtracking.
• https://www.youtube.com/watch?v=iaBEKo5sM7w
19-06-2023 Data Structures
• This recursive nature of DFS can be implemented using stacks.
• The basic idea is as follows:
 Pick a starting node and push all its adjacent nodes into a stack.
 Pop a node from stack to select the next node to visit and push all its adjacent nodes
into a stack.
 Repeat this process until the stack is empty.
However, ensure that the nodes that are visited are marked.
This will prevent you from visiting the same node more than once.
If you do not mark the nodes that are visited and you visit the same node more than
once, you may end up in an infinite loop.
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
DFS visiting order
V1
V2
V4
V8
V5
V6
V3
v7
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
Breadth First Search Traversal /Algorithm-
• Starting at vertex v (root node)and marking it as visited.
• Traversing the graph layerwise visiting the neighbour nodes (directly
connected to the root node)
• Traversing towards the next level neighbour nodes in breadth wise
• In BFS all nodes should be visited in
layer 1 before moving to the next
Layer 2
https://www.youtube.com/watch?v
=QRq6p9s8NVg
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
BFS visiting order
V1
V2
V3
V4
V5
V6
V7
V8
19-06-2023 Data Structures
19-06-2023 Data Structures
Connected components
• Connectivity in an undirected graph means that every vertex can reach every
other vertex via any path.
• Strong Connectivity applies only to directed graphs. A directed graph is
strongly connected if there is a directed path from any vertex to every other
vertex.
• If the graph is not connected the graph can be broken down into Connected
Components.
• This is same as connectivity in an undirected graph, the only difference being
strong connectivity applies to directed graphs and there should be directed
paths instead of just paths. Similar to connected components, a directed
graph can be broken down into Strongly Connected Components.
• To determine all the connected components of the graph
• It can be obtained by making either DFS(v) or BFS(v) calls repeatedly
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
Spanning tree and Minimum Cost
Spanning Trees
• A graph which contains all vertices with minimum number of edges
• If any vertex is missed it is not a spanning tree
• A spanning tree contains n-1 edges where n is the number of vertices
• Edges of the vertices may or may not have weights assigned to them
• All the possible spanning trees have same number of vertices but the
number of edges would be n-1.
19-06-2023 Data Structures
n=4
e=n-1=4-1=3
• Cycle should not formed while designing a spanning tree
• When BFS is used the resulting tree is called BFS spanning tree and when
DFS is used the resulting tree is called DFS spanning tree.
19-06-2023 Data Structures
19-06-2023 Data Structures
Application of Spanning Tree
• Spanning tree is basically used to find a minimum path to connect all nodes in
a graph. Common application of spanning trees are −
Civil Network Planning
Computer Network Routing Protocol
Cluster Analysis
Minimum Spanning Tree –
• The cost of a spanning tree is the sum of the costs of the edges in that tree
• One approach to find out the minimum cost spanning tree by Krushal.
• In this approach minimum cost spanning tree T is built edge by edge
• Edges are considered for inclusion in T if t is in non decreasing order of their
costs.
• Loops and parallel edges are removed
• An edge is included in T if it does not form a cycle with the edges already in T
• Since G is connected and has n>0 vertices exactly n-1 edges will be selected
for inclusion in T
• Time complexity of minimum cost spanning tree is O(e log e) where e is the
number of edges in E.
19-06-2023 Data Structures
(2,3) -5
(2,4)-6
(4,3)-10
(2,6)-11
(4,6)-14
(1,2)-16
(4,5)-18
(1,5)-19
(5,6)-33
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
Shortest Path
• The length of the path is defined to be the sum of the weights of the edges
on that path rather than the number of edges.
• The starting vertex of the path will be referred to as source and the last
vertex is called as destination
• The graphs will be digraphs and weights assigned are positive
Single Source All destinations
• Given a directed graph G=(V,E) ,a weighing function w(e) for the edges of G
and the source vertex V0.
• Finding the shortest paths from V0 to all the remaining vertices of G
19-06-2023 Data Structures
19-06-2023 Data Structures
• Shortest path algorithm first given by Dijkstra to determine the shortest
paths from v0 to all other vertices in G
• Number of vertices starts from 1 through n
• The Set S is maintained as a bit array with S(i)=0 if vertex I is not in S and
S(i)=1 if it is
• The graph is represented by its cost adjacency matrix with COST(i,j)being
the weight of the edge (i,j)
• DIST(i)
19-06-2023 Data Structures
Basics of Dijkstra's Algorithm
• Dijkstra's Algorithm basically starts at the node that you choose (the source
node) and it analyzes the graph to find the shortest path between that node
and all the other nodes in the graph.
• The algorithm keeps track of the currently known shortest distance from
each node to the source node and it updates these values if it finds a
shorter path.
• Once the algorithm has found the shortest path between the source node
and another node, that node is marked as "visited" and added to the path.
• The process continues until all the nodes in the graph have been added to
the path. This way, we have a path that connects the source node to all
other nodes following the shortest path possible to reach each node.
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
Transitive Closure
• Determining the existence of the path between every pair of vertices
• Given a directed graph, find out if a vertex j is reachable from another vertex
i for all vertex pairs (i, j) in the given graph.
• Reachable mean that there is a path from vertex i to j. The reachability
matrix is called the transitive closure of a graph.
19-06-2023 Data Structures
19-06-2023 Data Structures
Unit4
External Sorting
• Storage Devices
• Sorting with disks
• Sorting with Tapes
• Symbol Tables
• Static tree tables
• Dynamic Tree tables
• Hash tables
19-06-2023 Data Structures
• Techniques to sort large files
• The files are large to accumulate in internal memory of a computer
• Characteristics of external storage devices
• External storage devices are broadly categorized
• Sequential access(tapes)
• Direct access (drums and disks)
19-06-2023 Data Structures
19-06-2023 Data Structures
Storage Devices –
Magnetic Tapes
• Used for Computer input /output
• Data is recorded on magnetic tape approximately ½” wide
• The tape is wound around a spool
• A new reel of tape is normally 2400 ft long
• Tracks run across the length of the tape with a tape having
typically 7 to 9 tracks across its width
• Depending on the direction of magnetization ,
a spot on the track can represent either as 0 or 1
• Combination of bits on the tracks represents
a character (A-Z,0-9,etc.)
19-06-2023 Data Structures
• The number of bits written per inch of the track is referred to as tape density
• Reading from a magnetic tape or writing onto it is done from a magnetic
drive.
• A tape drive consist of 2 spindles
• One of the spindle is mounted with source
Reel and the other one take up the reel
• Forward reading or writing the tape is pulled
From the source reel across the read/write
heads and onto the take up reel
• Some tape drives also permit backward
Reading and writing of tapes
19-06-2023 Data Structures
• If characters are packed onto a tape at a density of 800dpi then a 2400ft
tape would hold a little over 23x106 characters
• If the tape does not have enough space for one full information it can be
grouped into several blocks
• These blocks may be of variable size or fixed size
• In between blocks of data is an interblock gap normally about ¾ inches long
• The interblock gap is long enough to permit
the tape to accelerate from rest to the correct
Read/write speed before the beginning of the
next block reaches the read/write heads.
• To read a block from a tape one specifies the length of the block and also the
address A in the memory
19-06-2023 Data Structures
• To write a block of data onto a tape the starting address and the number of
consecutive words to be written in the memory
• The block size will correspond to the size of the input/output buffers set up
in memory
• Computer tape is an example of sequential access device
• If the read head is positioned at the front of the tape and one wishes to read
the information ina block 2000ft down the tape then it is necessary to
forward space the tape the correct number of blocks .
• If to read the first block the tape would have to be rewound 2000 ft to the
front before the first block could be read.
• Typical rewind times over 2400ft of tape takes around 1 minute.
19-06-2023 Data Structures
• Some assumptions about the tape drive
Tapes can be written and read in the forward direction only
The I/O channel of a computer permits 3 tasks to be carried out parallel –writing on to
the tape, reading from another tape and CPU operation
19-06-2023 Data Structures
Disk Storage-
• Disks is a direct access storage device
• Disks has two distinct component
The disk module(simply the disk on which info is stored)
The disk drive (corresponding to the tape drive which performs the reading or writing
information onto disks)
• Disks can be removed or mounted onto a disk drive
• The disk pack consists of several platters that are similar to phonograph
records. The number of platters per pack varies and typically is about 6.
• Each platter has 2 surfaces on which information
can be recorded
19-06-2023 Data Structures
• The outer surfaces of the top and bottom
surface are not used
• There are total of 10 surfaces on which the
information may be recorded
• Disk contains the spindle on which the disk
May be mounted and a set of read/write heads
• There is one read/write for each surface
• Every read/write the heads are held stationary
over the position of the platter where the
read/write to be performed
• While disks itself rotates at high speeds
(2000-3000 rpm)
19-06-2023 Data Structures
• Every disk will read/write in concentric circles on each surface
• The area that can be read from or written
onto a single stationary head is referred as a track.
• Tracks are thus concentric circles and each time
the disk completes the revolution an entire track
Passes a read/write head
• There may be 100 to 1000 tracks on each
surface of a platter
• The collection of tracks simultaneously under
a read/write head on the surfaces of all the
platters is called a cylinder
19-06-2023 Data Structures
• Tracks are divided into sectors
• A sector is a smallest addressable segment of a track
• Information is stored along the tracks of a surface in the blocks
• In order to use a disk the sector number has to be specified
• The read/write head assembly is positioned to right side of the cylinder.
• Before start to read/write it has to wait for the right sector to come beneath
the read/write head
• Then transmission can take place
• Three factors contributing to I/O time for disks
Seek time –time taken to position the read/write heads to the correct cylinder
depends on the number of cylinders across which the heads have to move
Latency time-time until the right sector of the track is under the read/write head
Transmission time –time taken to transmit the block of data to/from the disk
19-06-2023 Data Structures
Sorting with disks-
• The most popular method of sorting in external device is merge sort
• This method have two distinct phases
1. First, divide the file into runs such that the size of a run is small enough to fit into
the main memory. Next, sort each run in main memory using the standard merge
sort sorting algorithm.
2. Finally, merge the resulting runs into successively bigger runs until the file is sorted.
• Calculate the overall computing time
• For eg
19-06-2023 Data Structures
19-06-2023 Data Structures
1. Internally sort three blocks at a time(ie 750 records) to obtain six runs R-
R6.A method such as heap sort or quick sort could be used .these six runs
are written out on to the disk.
2. Set aside 3 blocks of internal memory each capable of holding 250 records.
Two of these blocks will be used as input buffers and one as the output
buffer. Merge R1 and R2.this is carried out by first reading one block of
each of these runs into input buffers.
3. Blocks of runs are merged from the input buffers in to the output buffer
4. When the output buffer gets full it is written on to the disk.
5. If an input buffer gets empty it is refilled with another block from the same
run
6. Then R3,R4 and finally R5 ,R6 are merged
19-06-2023 Data Structures
19-06-2023 Data Structures
• Analysing the time required to sort these 4500 records .the analysis will
have the following notation
• Seek time can be reduced by writing the blocks in the same cylinder or
adjacent cylinders
• Should have a close look of the computing time indicates on the number of
passes made over the data.
19-06-2023 Data Structures
• Not efficiently using the computers ability to carry I/O ,CPU operations in
parallel and overlap some of the time.
• Parallelism is an important consideration when sorting is done in a non multi
programming environment (when I/O and CPU processing is going on parallel
,the CPU is idle during I/O)
• Parallelism is not possible to achieve because of the structure of the OS
19-06-2023 Data Structures
• K-way merging-
To sort a set of sorted arrays of n values
Heap sort is applied in k sorted arrays of n values
The K-way Merge pattern looks like this;
• We can push the smallest (first) element of each sorted array in a Min Heap to get
the overall minimum.
• After this step, we can take out the smallest (top) element from the heap and then
add it to the merged list.
• After removing the smallest element from the heap, insert the next element of the
same list into the heap.
• We can repeat steps 2 and 3 to populate the merged list in sorted order.
• Time Complexity = O(N log K) where N is the total number of elements in all the K
input arrays.
• Space Complexity = O(K)
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
• https://www.youtube.com/watch?v=Xo54nlPHSpg
19-06-2023 Data Structures
• Significant reduction in the number of comparisons needed to find the
next smallest number by using the selection tree
• A selection tree is a binary tree where each node represents the smaller
of its 2 children
• Thus the root node represents the smallest node in the tree
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
Sorting with tapes-
• Sorting on tapes is carried out using the same steps as sorting on disks
• Difference between sorting in tapes and disks lies in the manner in which
runs are maintained on the external storage media.
• Tapes are sequential access
• Seek time and latency time are different for both tapes and disks
• High seek time and latency time on tapes
• The blocks on tape be read sequentially during k-way merge of runs
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
• Computing time analysis assumes that no operation are carried out in
parallel
19-06-2023 Data Structures
Symbol tables
• A symbol table is a set of name-value pairs
• Associated with each name in the table is an attribute , a collection of
attributes ,or some directions about some processing
• Symbol tables have fixed number of entries
• Operations performed on symbol table are
Ask if a particular name is already present
Retrieve the attributes of that name
Insert a new name and its value
Delete a name and its value
19-06-2023 Data Structures
19-06-2023 Data Structures
• Different ways to implement symbol tables are
• Static tree table
• Dynamic tree table
Static tree table –
• When identifiers are known in advance
• no insertion or deletions are allowed
• Symbol tables with this property is called static
• The names are sorted and stored them sequentially either using binary
search tree or Fibonacci search method
• Any names can be find out in o(log2n) operations
19-06-2023 Data Structures
19-06-2023 Data Structures
• While evaluating BST add a special “square” node at every place there is a
null link
19-06-2023 Data Structures
• Every binary tree with null links can be represented as two nodes
• External nodes (or failure nodes)–they are not the part of the original tree
• Internal nodes –remaining nodes are called as internal nodes
• A binary search tree with the external nodes are called extended binary
tree
• Each time binary search tree is examined for an identifier
• If it is not available in the tree then the search terminates with the
unsuccessful searches
19-06-2023 Data Structures
19-06-2023 Data Structures
• Finding the length of the external path and internal path of a binary tree
• External path length of a binary tree to be the sum over all external nodes
of the lengths of the paths from the root to those nodes
19-06-2023 Data Structures
19-06-2023 Data Structures
Weighed external path length of such as binary tree is calculated by
Where Ki is the distance from the root node to the external node with weight
qi.
Supose n=3,q1=15,q2=2,q3=4 and q5=5
19-06-2023 Data Structures
19-06-2023 Data Structures
• With over all binary tree with n internal nodes finding the minimum and
maximum values for I
• To obtain trees with minimal I ,there should be as many as internal nodes as
close to the root node
• One tree with minimal internal path length is the complete binary tree
• Binary trees with minimal weighed external path length is used in many
applications such as optimal set of codes for message M1,…Mn+1.
• Each code in the binary string will be used for transmission of the
corresponding message
• At the receiving end it will be decoded using a decode tree
• A decode tree is a binary tree in which external nodes represent messages
• The binary bits in the code word for a message determine the branching
needed at each level of the decode tree to reach the correct external node
19-06-2023 Data Structures
19-06-2023 Data Structures
Huffman Codes-
M1=000
M2=001
M3=01
M4=1
• The cost of decoding a code word is proportional to the number of bits in the code
• This number is equal to the distance of the corresponding external node from the root node
• The expected decode time is minimized by choosing code words resulting in a decode tree
with minimal weighted external path length.
Huffman Algorithm-
• Huffman Coding is a technique of compressing data to reduce its size without
losing any of the details. It was first developed by David Huffman in 1951.
• It follows a Greedy approach, since it deals with generating minimum length
prefix-free binary codes
• Huffman Coding is generally useful to compress the data in which there are
frequently occurring characters.
• Each character occupies 8 bits. There are a total of 15 characters in the above
string. Thus, a total of 8 * 15 = 120 bits are required to send this string.
• Using the Huffman Coding technique, we can compress the string to a smaller
size.
• Huffman coding first creates a tree using the frequencies of the character and
then generates code for each character.
19-06-2023 Data Structures
Steps of Huffman encoding algorithm
1. Calculate the frequency of each character in the string.
2. Sort the characters in increasing order of the frequency. These are stored in
a priority queue Q.
19-06-2023 Data Structures
3. Make each unique character as a leaf node.
4. Assign the minimum frequency as the left child and assign the second
minimum frequency as the right child .Set the value as the sum of the above
two minimum frequencies.
5. Repeat steps 3 & 4 for all the characters.
19-06-2023 Data Structures
6. For each non-leaf node, assign 0 to the left edge and 1 to the right edge.
19-06-2023 Data Structures
19-06-2023 Data Structures
• Without encoding, the total size of the string was 120 bits. After encoding the
size is reduced to 32 + 15 + 28 = 75.
Decoding –
• For decoding the code, we can take the code and traverse through the tree to
find the character.
• Let 101 is to be decoded, we can traverse from the root as in the figure below.
Huffman Encoding Algorithm
19-06-2023 Data Structures
• create a priority queue Q consisting of each unique character.
• sort then in ascending order of their frequencies.
• for all the unique characters:
• create a newNode extract minimum value from Q and assign it to
leftChild of newNode
• extract minimum value from Q and assign it to rightChild of newNode
• calculate the sum of these two minimum values and assign it to the
value of newNode
• insert this newNode into the tree return rootNode
19-06-2023 Data Structures
Time Complexity –
• The time complexity for encoding each unique character based on its
frequency is O(nlog n).
• Extracting minimum frequency from the priority queue takes place
2*(n-1) times and its complexity is O(log n). Thus the overall complexity is
O(nlog n).
Advantages of Huffman Encoding-
• This encoding scheme results in saving lot of storage space, since the binary
codes generated are variable in length
• It generates shorter binary codes for encoding symbols/characters that
appear more frequently in the input string
• The binary codes generated are prefix-free
19-06-2023 Data Structures
Disadvantages of Huffman Encoding-
• Lossless data encoding schemes, like Huffman encoding, achieve a lower
compression ratio compared to lossy encoding techniques. Thus, lossless
techniques like Huffman encoding are suitable only for encoding text and
program files and are unsuitable for encoding digital images.
• Huffman encoding is a relatively slower process since it uses two passes- one for
building the statistical model and another for encoding. Thus, the lossless
techniques that use Huffman encoding are considerably slower than others.
• Since length of all the binary codes is different, it becomes difficult for the
decoding software to detect whether the encoded data is corrupt. This can
result in an incorrect decoding and subsequently, a wrong output
19-06-2023 Data Structures
Real-life applications of Huffman Encoding-
•Huffman encoding is widely used in compression formats like GZIP, PKZIP
(winzip) and BZIP2.
•Multimedia codecs like JPEG, PNG and MP3 uses Huffman encoding (to be more
precised the prefix codes)
19-06-2023 Data Structures
Dynamic Tree tables-
• Dynamic tables may also be maintained as BST
• Insertion, deletion and searching of a node can be done
• When insertions and deletions are done it is necessary to restructure the
whole tree to accommodate the changes and at the same time it should be a
complete binary tree
• It gives the worst time complexity O(h)
• To make less time time complexity the tree should be self balanced or height
balanced using the balance factor
• A method of growing self balanced /Height balanced tree is followed
19-06-2023 Data Structures
• worst time complexity O(h)
• Worst time complexity O(h)
19-06-2023 Data Structures
O(h)
h=log(n)
AVL Tree-
• Adelson –Velskii and Landis in 1962 introduced a balanced binary search
tree with respect to the heights of the subtrees
• Dynamic searching can be in the balanced BST can be performed in O(log n)
time if the tree has n nodes on it
• Insertion and deletion in the same tree can be done in O(log n) time
• The resulting tree remains balanced
19-06-2023 Data Structures
Balance factor=height of left tree-height of right subtree
19-06-2023 Data Structures
The tree having the balance factor greater than 1 or less than -1 is not called
balanced tree or AVL tree
19-06-2023 Data Structures
• If the tree is not an AVL tree then the tree can be converted to AVL tree by
performing these operations
• LL
• RR
• LR
• RR
19-06-2023 Data Structures
• Left rotation-If a tree becomes unbalanced, when a node is inserted into the
right subtree of the right subtree, then we perform a single left rotation
• Right rotation-AVL tree may become unbalanced, if a node is inserted in the
left subtree of the left subtree. The tree then needs a right rotation
19-06-2023 Data Structures
19-06-2023 Data Structures
Right-Left Rotation
• The second type of double rotation is Right-Left Rotation. It is a combination of
right rotation followed by left rotation
19-06-2023 Data Structures
Hashing-
• Hashing is an important data structure designed to solve the problem of efficiently finding
and storing data in an array.
• Hashing is a method for storing and retrieving records from a database.
• insert, delete, and search for records based on a search key value in a constant time
• A hash system stores records in an array called a hash table (HT)
• Every hash table contains values or records stored sequentially .
• Hashing works by performing a computation on a search key K in a way that is intended to
identify the position in HT that contains the record with key K.
• Hash table is partitioned into b buckets HT(0)….HT(b-1)
• Each bucket is capable of holding s records in s slots each slot being large enough to hold 1
record
• Each bucket can hold exactly 1 record in each slot
19-06-2023 Data Structures
19-06-2023 Data Structures
• Hash tables use a technique to generate these unique index numbers for each
value stored in an array format. This technique is called the hash technique or
hashing
• Hashing searches an identifier or record by the address or location of the
record.
19-06-2023 Data Structures
• It returns the following values: a small integer value (also known as hash
value), hash codes, and hash sums. The hashing techniques in the data
structure are very interesting, such as:
• hash = hashfunc(key)
• index = hash % array_size
• Types of hashing in data structure is a two-step
process.
The hash function converts the item into a small integer
or hash value. This integer is used as an index to store
the original data.
It stores the data in a hash table. a hash key can be used to
to locate data quickly.
19-06-2023 Data Structures
• Overflow occurs when a new identifier is mapped or hashed into a full bucket
• Collison occurs when two non identical identifiers are hashed into the same
bucket /Collision in hashing is when two or more elements are fighting for the
same slot in the hash table/If the hash function returns the same index for
more than one element then the collision will occur.
• When bucket size is 1 (s=1) collision and overflows simultaneously occurs
• Hashing functions/Methods to handle overflows and collisions are
Mid square
Division
Folding
Digit analysis
19-06-2023 Data Structures
Mid-square(middle of square) :
• Mid-Square(fm) hashing is a hashing technique in which unique keys are
generated.
• a seed value is taken and it is squared.
• Then, some digits from the middle are extracted. These extracted digits form
a number which is taken as the new seed.
• This technique can generate keys with high randomness if a big enough seed
value is taken.
• This process is repeated as many times as a key is required.
19-06-2023 Data Structures
Example-
Suppose a 4-digit seed is taken. seed = 4765
Hence, square of seed is = 4765 * 4765 = 22705225
Now, from this 8-digit number, any four digits are extracted (Say, the middle
four).
So, the new seed value becomes seed = 7052
Now, square of this new seed is = 7052 * 7052 = 49730704
Again, the same set of 4-digits is extracted.
So, the new seed value becomes seed = 7307
.
.
19-06-2023 Data Structures
Division-
• Hash function obtained by using the modulo(mod) operator
• The value is divided by some number M(size of the hash table) and the
remainder is used as the hash address for X
• Example
Size of Hash Table (m) = 1000 (0 - 999)
Suppose we want to calculate the index of element x, where x = 123789456
index =123789456 mod 1000
= 456
The element x is stored at position 456 in the hash table.
19-06-2023 Data Structures
Folding –
• The key k is partitioned into a number of parts k1, k2.... kn where each part
except possibly the last, has the same number of digits as the required
address.
• Then the parts are added together, ignoring the last carry.
• There are two type of folding:
Shift –all are added except least bit
Boundary-Alternate pieces are flipped on the boundary.
Boundary folding is indicated by 𝑝𝑖
𝑟
19-06-2023 Data Structures
Digit analysis-
• Digit analysis, is used with static files.
• A static file is one in which all the identifiers are known in advance. Using
this method, we first transform the identifiers into numbers using some
radix, r.
• Then examine the digits of each identifier, deleting those digits that have the
most skewed distributions. Continue deleting digits until the number of
remaining digits is small enough to give an address in the range of the hash
table.
• The digits used to calculate the hash address must be the same for all
identifiers and must not have abnormally high peaks or valleys (the standard
deviation must be small).
19-06-2023 Data Structures
Overflow handling –
• To detect/handle overflow and collisions/open addressing
• Different ways are
Linear probing
Quadratic probing
Double hashing
Linear probing –
In linear probing, the hash table is searched sequentially that starts from the
original location of the hash. If in case the location is already occupied, then
check for the next location.
It is also called as rehashing
19-06-2023 Data Structures
For example Let us consider a simple hash function as “key mod 7”
and a sequence of keys as 50, 700, 76, 85, 92,
73, 101.
Let us consider a simple hash function as
“key mod 5” and a sequence of keys that
are to be inserted are 50, 70, 76, 93.
19-06-2023 Data Structures
Let hash(x) be the slot index computed using a hash function and S be the table size
If slot hash(x) % S is full, then we try (hash(x) + 1) % S
If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S
If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S
50%7=1
700%7=0
76%7=6
85%7=1
92%7=1
73%7=3
101%7=3
• 50, 70, 76, 93 50%5=0 70%5=0 76%5=1
• 93%5=3
19-06-2023 Data Structures
Quadratic probing-
• In this method, we look for the i2‘th slot in the ith iteration.
• Always start from the original hash location. If only the location is occupied
then check the other slots.
let hash(x) be the slot index computed using hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S
If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S
If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S
19-06-2023 Data Structures
Let us consider table Size = 7, hash function as Hash(x) = x % 7
Insert = 22, 30, 50.
19-06-2023 Data Structures
• Insert 22 and 30Hash(22) = 22 % 7 = 1, Since the cell at index 1 is empty, we
can easily insert 22 at slot 1.
• Hash(30) = 30 % 7 = 2, Since the cell at index 2 is empty, we can easily insert
30 at slot 2
19-06-2023 Data Structures
• Inserting 50Hash(50) = 50 % 7 = 1
• In our hash table slot 1 is already occupied. So, we will search for slot 1+12, i.e.
1+1 = 2,
• Again slot 2 is found occupied, so we will search for cell 1+22, i.e.1+4 = 5,
• Now, cell 5 is not occupied so we will place 50 in slot 5.
19-06-2023 Data Structures
Double hashing-
• In this technique, the increments for the probing sequence are computed by using
another hash function.
• use another hash function hash2(x) and look for the i*hash2(x) slot in the ith
rotation.
let hash(x) be the slot index computed using hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S
If (hash(x) + 1*hash2(x)) % S is also full, then we try (hash(x) + 2*hash2(x)) % S
If (hash(x) + 2*hash2(x)) % S is also full, then we try (hash(x) + 3*hash2(x)) % S
19-06-2023 Data Structures
• Insert the keys 27, 43, 92, 72 into the Hash Table of size 7. where first hash-
function is h1​(k) = k mod 7 and second hash-function is h2(k) = 1 + (k mod 5)
• Insert 27 27 % 7 = 6, location 6 is empty so insert 27 into 6 slot.
19-06-2023 Data Structures
• Insert 43 43 % 7 = 1, location 1 is empty so insert 43 into 1 slot.
19-06-2023 Data Structures
• Insert 92
• 92 % 7 = 6, but location 6 is already being occupied and this is a collision
• So need to resolve this collision using double hashing.
• h1​(k) = k mod 7
• h2(k) = 1 + (k mod 5)
19-06-2023 Data Structures
hnew = [h1(92) + i * (h2(92)] % 7
= [6 + 1 * (1 + 92 % 5)] % 7
= 9 % 7
= 2
Now, as 2 is an empty slot,
so we can insert 92 into 2nd slot.
•Insert 72
•72 % 7 = 2, but location 2 is already being occupied and this is a collision.
•So we need to resolve this collision using double hashing.
hnew = [h1(72) + i * (h2(72)] % 7
= [2 + 1 * (1 + 72 % 5)] % 7
= 5 % 7
= 5,
Now, as 5 is an empty slot,
so we can insert 72 into 5th slot.
19-06-2023 Data Structures
Unit-5
Internal Sorting
• Sorting is categorized into
• Internal sorting
• External sorting
• Internal sorting methods are
• Insertion sort
• Quick sort
• 2-way Merge sort
• Heap sort
• Shell sort
19-06-2023 Data Structures
Insertion sort-
• The basic step is to insert a record r into a sequence of ordered records.
• It is carried out in the beginning with the ordered sequence and then
successively inserting the records into the
Sequence
19-06-2023 Data Structures
• This algorithm is not suitable for large data sets as its average and worst case
complexity are of Ο(n2), where n is the number of items.
Quick sort-
• It is developed by C.A.R hoare
• Sorting with a good average behaviour
• Quick sort is a highly efficient sorting algorithm and is based on partitioning
of array of data into smaller arrays
• A large array is partitioned into two arrays one of which holds values smaller
than the specified value, say pivot, based on which the partition is made and
another array holds values greater than the pivot value.
• Quicksort partitions an array and then calls itself recursively twice to sort the
two resulting subarrays. This algorithm is quite efficient for large-sized data
sets as its average and worst-case complexity are O(n2), respectively.
19-06-2023 Data Structures
• This algorithm follows the divide and conquer approach.
• Divide and conquer is a technique of breaking down the algorithms into
subproblems, then solving the subproblems, and combining the results back
together to solve the original problem.
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
2-way merge sort-
19-06-2023 Data Structures
19-06-2023 Data Structures
Heap Sort-
• Heap is a tree-based data structure in which all the tree nodes are in a
particular order, such that the tree satisfies the heap properties
• Heap sort may be regarded as two stage method
It is converted to heap with the property that the value of each node is at least as
large as the value of its children nodes .root is the largest key in the tree
The output sequence is generated in decreasing order by successively outputting the
root and restructuring the remaining tree into a heap
• Follow the given steps to solve the problem:
Build a max heap from the input data.
At this point, the maximum element is stored at the root of the heap. Replace it with
the last item of the heap followed by reducing the size of the heap by 1. Finally,
heapify the root of the tree.
Repeat step 2 while the size of the heap is greater than 1.
19-06-2023 Data Structures
19-06-2023 Data Structures
19-06-2023 Data Structures
Shell sort-
• Shell sort is the generalization of insertion sort, which overcomes the drawbacks of
insertion sort by comparing elements separated by a gap of several positions.
• it is an extended version of insertion sort. Shell sort has improved the average time
complexity of insertion sort. As similar to insertion sort, it is a comparison-based
and in-place sorting algorithm.
• Shell sort is efficient for medium-sized data sets.
• In insertion sort, at a time, elements can be moved ahead by one position only. To
move an element to a far-away position, many movements are required that
increase the algorithm's execution time. But shell sort overcomes this drawback of
insertion sort. It allows the movement and swapping of far-away elements as well.
• This algorithm first sorts the elements that are far away from each other, then it
subsequently reduces the gap between them. This gap is called as interval. This
interval can be calculated by using the Knuth's formula given below –
19-06-2023 Data Structures
•h= h * 3 + 1
•where, 'h' is the interval having initial value 1.
19-06-2023 Data Structures
in the first loop, the element at the 0th position will be
compared with the element at 4th position. If the 0th element is
greater, it will be swapped with the element at 4th position.
Otherwise, it remains the same. This process will continue for
the remaining elements.
19-06-2023 Data Structures
In the second loop, elements are lying at the interval of 2 (n/4 = 2), where n = 8.
Now, we are taking the interval of 2 to sort the rest of the array. With an interval of 2, two sublists will be
generated - {12, 25, 33, 40}, and {17, 8, 31, 42}.
Files ,Queries and Sequential organizations
Files-
• A file is a collection of records where each record consists of one or more
fields.
• Primary objective of file organization is to provide means for record retrieval
and update
• Update includes deletion, changes in fields or insertion of entirely new record
19-06-2023 Data Structures
• Certain fields in the record are designated as key fields
• Records may be retrieved by specifying values for some or all of these keys.
• Combination of key values specified for retrieval is called query
• Invalid query to the file would be location=Los angeles
19-06-2023 Data Structures
• Obtaining data representations of files on external storage devices for
efficient use should have some factors
Kind of external storage device available
Type of queries allowed
Number of keys
Mode of retrieval/update
Storage device types
• Concerned abut files stored on disks/tapes
Query types
19-06-2023 Data Structures
Number of keys –
• Distinction between files having only one key or files with more than one key
Mode of retrieval-
• May be either real time or batched
• In real time the response time for any query should be minimal
• In the batched mode the response time is not significant .Request for
retrieval are batched together on a transaction file until either enough
requests have been received or suitable amount of time has passed.then all
transaction files are processed
Mode of update
• Either be real or batched
• Real time update is needed for eg reservation of flight file must be changed
to show the new status
19-06-2023 Data Structures
• Batched update would be suitable in bank account system .for eg all
withdrawals and deposits made on particular day collected on a transaction
file and updates are made at the end of the day
• Batched update consists of two files :master file and transaction file
• Master file-represents the file status after the previous update
• Transaction file-holds all the update requests that have not yet been
reflected in the master file so master file is always “out of date”
• Records are placed sequentially onto the storage media (adjacent to each
other)
• The physical sequence of records is ordered on some key called primary key
• For batched retrieval and update ordered sequential files are preferred over
unordered sequential files since they are easier to process
19-06-2023 Data Structures
• File organization breaks down into two or more aspects
The directory
The physical organization of the records (sequential)
• Processing a query /update request would proceed in 2 steps
Indexes used to determine the parts of the physical file that are to be searched
These parts of the file will be searched and accessing the records satisfying the query
19-06-2023 Data Structures
File Organizations-
• Sequential organization
• Random Organization
• Linked organization
• Inverted files
• Cellular partitions
19-06-2023 Data Structures
Sequential Organization-
• Cylinder –surface index is maintained for the primary key
• In order to retrieve records efficiently indexes can be used
• Structure of the indexes is based on the index techniques
Random organization-
• Records are stored at random locations on the disk
• Several techniques are used for randomization .they are
Direct addressing
Directory lookup
hashing
19-06-2023 Data Structures
Direct addressing-
• Available disk space is divided in to nodes large enough to store records of
equal size
• The numeric value of the primary key is used to determine the node into
which a particular record is to be stored
• Searching and deleting a record by primary key value requires one disk access
• Updating a record requires 2 (1 to read and 1 to write back to the modified
record)
• Variable size records are being used an index can be set up with pointers to
actual records on the disk
19-06-2023 Data Structures
19-06-2023 Data Structures
Directory lookup-
• Retrieving a record involves searching the index for the record address and
then accessing the record itself
• The records an be of fixed or variable size
• Searching a record by index requires more than 1 access
• Every record has a unique primary key
• 2 or more records with the same primary key can cause collisions
Hashing-
• The available space is divided into buckets and slots
• Every record have hashed index
• Some space is set aside to handle overflow
19-06-2023 Data Structures
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx

More Related Content

Similar to II B.Sc IT DATA STRUCTURES.pptx

Introduction to DS.pptx
Introduction to DS.pptxIntroduction to DS.pptx
Introduction to DS.pptxOnkarModhave
 
CHAPTER-1- Introduction to data structure.pptx
CHAPTER-1- Introduction to data structure.pptxCHAPTER-1- Introduction to data structure.pptx
CHAPTER-1- Introduction to data structure.pptxOnkarModhave
 
1.1 introduction to Data Structures.ppt
1.1 introduction to Data Structures.ppt1.1 introduction to Data Structures.ppt
1.1 introduction to Data Structures.pptAshok280385
 
Data Structure & Algorithm.pptx
Data Structure & Algorithm.pptxData Structure & Algorithm.pptx
Data Structure & Algorithm.pptxMumtaz
 
Data structure and algorithm All in One
Data structure and algorithm All in OneData structure and algorithm All in One
Data structure and algorithm All in Onejehan1987
 
DS Module 1.pptx
DS Module 1.pptxDS Module 1.pptx
DS Module 1.pptxSaralaT3
 
data structure unit -1_170434dd7400.pptx
data structure unit -1_170434dd7400.pptxdata structure unit -1_170434dd7400.pptx
data structure unit -1_170434dd7400.pptxcoc7987515756
 
Data structure and algorithm.
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm. Abdul salam
 
Data Structure # vpmp polytechnic
Data Structure # vpmp polytechnicData Structure # vpmp polytechnic
Data Structure # vpmp polytechniclavparmar007
 
Unit 2 linear data structures
Unit 2   linear data structuresUnit 2   linear data structures
Unit 2 linear data structuresSenthil Murugan
 
Ist year Msc,2nd sem module1
Ist year Msc,2nd sem module1Ist year Msc,2nd sem module1
Ist year Msc,2nd sem module1blessyboban92
 
Data structure and algorithm using java
Data structure and algorithm using javaData structure and algorithm using java
Data structure and algorithm using javaNarayan Sau
 

Similar to II B.Sc IT DATA STRUCTURES.pptx (20)

Introduction to DS.pptx
Introduction to DS.pptxIntroduction to DS.pptx
Introduction to DS.pptx
 
stack.pptx
stack.pptxstack.pptx
stack.pptx
 
CHAPTER-1- Introduction to data structure.pptx
CHAPTER-1- Introduction to data structure.pptxCHAPTER-1- Introduction to data structure.pptx
CHAPTER-1- Introduction to data structure.pptx
 
1.1 introduction to Data Structures.ppt
1.1 introduction to Data Structures.ppt1.1 introduction to Data Structures.ppt
1.1 introduction to Data Structures.ppt
 
Intro_2.ppt
Intro_2.pptIntro_2.ppt
Intro_2.ppt
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
UNIT 1.pptx
UNIT 1.pptxUNIT 1.pptx
UNIT 1.pptx
 
Data structure
Data structureData structure
Data structure
 
Data structure
Data structureData structure
Data structure
 
Data Structure & Algorithm.pptx
Data Structure & Algorithm.pptxData Structure & Algorithm.pptx
Data Structure & Algorithm.pptx
 
Data structure and algorithm All in One
Data structure and algorithm All in OneData structure and algorithm All in One
Data structure and algorithm All in One
 
DS Module 1.pptx
DS Module 1.pptxDS Module 1.pptx
DS Module 1.pptx
 
data structure unit -1_170434dd7400.pptx
data structure unit -1_170434dd7400.pptxdata structure unit -1_170434dd7400.pptx
data structure unit -1_170434dd7400.pptx
 
Data structure and algorithm.
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm.
 
Data Structure # vpmp polytechnic
Data Structure # vpmp polytechnicData Structure # vpmp polytechnic
Data Structure # vpmp polytechnic
 
Unit 2 linear data structures
Unit 2   linear data structuresUnit 2   linear data structures
Unit 2 linear data structures
 
Ist year Msc,2nd sem module1
Ist year Msc,2nd sem module1Ist year Msc,2nd sem module1
Ist year Msc,2nd sem module1
 
UNIT 3 PPT.ppt
UNIT 3 PPT.pptUNIT 3 PPT.ppt
UNIT 3 PPT.ppt
 
Data structure and algorithm using java
Data structure and algorithm using javaData structure and algorithm using java
Data structure and algorithm using java
 

Recently uploaded

Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 

Recently uploaded (20)

Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 

II B.Sc IT DATA STRUCTURES.pptx

  • 2. Unit -1 • Introduction of Algorithms • Analysing Algorithms • Arrays: Sparse Matrices • Representation of Arrays • Stacks and Queues • Fundamentals - Evaluation of Expression Infix to Postfix Conversion • Multiple Stacks and Queues 19-06-2023 Data Structures
  • 3. Introduction • Algorithm is a step-by-step procedure, which defines a set of instructions to be executed in a certain order to get the desired output • An algorithm can be implemented in more than one programming language.(For eg.C ,C++,Python,Ruby) • Algorithms just Data structures • Categories of Algorithm are Search − Algorithm to search an item in a data structure. Sort − Algorithm to sort items in a certain order. Insert − Algorithm to insert item in a data structure. Update − Algorithm to update an existing item in a data structure. Delete − Algorithm to delete an existing item from a data structure. 19-06-2023 Data Structures
  • 4. Characteristics • Unambiguous − Algorithm should be clear and unambiguous. Each of its steps (or phases), and their inputs/outputs should be clear and must lead to only one meaning. • Input − An algorithm should have 0 or more well-defined inputs. • Output − An algorithm should have 1 or more well-defined outputs, and should match the desired output. • Finiteness − Algorithms must terminate after a finite number of steps. • Feasibility − Should be feasible with the available resources. • Independent − An algorithm should have step-by-step directions, which should be independent of any programming code. 19-06-2023 Data Structures
  • 5. How to Write an Algorithm? • step-by-step procedure • Algorithm writing is a process and is executed after the problem domain is well-defined. • Example 19-06-2023 Data Structures
  • 6. Advantages of Algorithms: • It is easy to understand. • An algorithm is a step-wise representation of a solution to a given problem. • In Algorithm the problem is broken down into smaller pieces or steps hence, it is easier for the programmer to convert it into an actual program. Disadvantages of Algorithms: • Writing an algorithm takes a long time so it is time-consuming. • Understanding complex logic through algorithms can be very difficult. • Branching and Looping statements are difficult to show in Algorithms 19-06-2023 Data Structures
  • 7. Analysis of Algorithms • Provides theoretical estimation for the required resources of an algorithm to solve a specific computational problem. • Analysis of algorithms is the determination of the amount of time and space resources required to execute it. • Efficiency(CPU, Memory ,Disk, Network ) • Time complexity • Space complexity • Different ways of analysis Asymptotic Analysis Worst, Average and Best Cases Asymptotic Notations Analysis of Loops  Solving Recurrences  Amortized Analysis 19-06-2023 Data Structures
  • 9. Asymptotic Analysis • Performance of the algorithm based on the input size • Relation between the running time and the input size • Time and Space factor Worst, Average and Best Cases • Divided into three different cases Best Case(Ω) − minimum time taken to execute the program. Average Case(θ) − average time taken to execute the program. Worst Case(O) − maximum time taken to execute the program. Asymptotic Notations • Asymptotic notations are mathematical tools to represent the time complexity of algorithms for asymptotic analysis. Ο (Big O) Notation Ω (Omega)Notation θ (Theta) Notation 19-06-2023 Data Structures
  • 10. 19-06-2023 Data Structures Analysis of Loops • analysis of iterative programs O(1): Time complexity of a function (or set of statements) is considered as O(1) if it doesn’t contain loop, recursion, and call to any other non-constant time function. // c=a+b print c; // O(n): Time Complexity of a loop is considered as O(n) if the loop variables are incremented/decremented by a constant amount. O(nc): Time complexity of nested loops is equal to the number of times the innermost statement is executed.
  • 11. O(Logn) Time Complexity of a loop is considered as O(Logn) if the loop variables are divided/multiplied by a constant amount. And also for recursive call in recursive function the Time Complexity is considered as O(Logn). O(LogLogn) Time Complexity of a loop is considered as O(LogLogn) if the loop variables are reduced/increased exponentially by a constant amount. 19-06-2023 Data Structures Time Complexity of Loops O(1) Set of statements O(n) incremented/decremented by a constant amount O(nc) Innermost statements in nested loops executed no. of times O(Logn) divided/multiplied by a constant amount. O(LogLogn) reduced/increased exponentially
  • 12. Solving Recurrences • Solving recursive problems • There are mainly three ways of solving recurrences. Substitution Method- Making a guess for the solution and then using mathematical induction to prove the guess is correct or incorrect. Recurrence Tree Method- Draw a recurrence tree and calculate the time taken by every level of the tree. Finally, sum the work done at all levels. Eg.Divide and Conquer method Master Method- Master Method is a direct way to get the solution. Amortized Analysis • is used for algorithms where an occasional operation is very slow, but most of the other operations are faster. 19-06-2023 Data Structures
  • 14. Basics of Data structure • Structuring/organizing the Data in a computer so that it can be used effectively • Data must be atomic, traceable, accurate ,clear and concise. • Data type • Basic Operations  Traverse  Search  Insert  Delete  Sort  Merge  Create  Retrieve  Store 19-06-2023 Data Structures Built-in Data Type Derived Data Type •Integers •Boolean (true, false) •Floating (Decimal numbers) •Character and Strings • List • Array • Stack • Queue
  • 15. Arrays • fixed-size sequenced collection of variables belonging to the same data types and stored in contiguous memory. • Set of pairs, index or value • The array has adjacent memory locations to store values. • convenient structure for representing data • Two terms to understand the concept of array are Element and Index Element − Each item stored in an array is called an element. Index − Each location of an element in an array has a numerical index, which is used to identify the element. data_type array_name [array_size]; 19-06-2023 Data Structures
  • 16. • Index starts with 0. • Array length is 10 which means it can store 10 elements. • Each element can be accessed via its index(mapping). For example, we can fetch an element at index 6 as 9. # structure ARRAY(value, index) declare CREATE( ) array RETRIEVE(array,index) value STORE(array,index,value) array; # Need for Arrays • number of variables used will increase 19-06-2023 Data Structures
  • 17. Ordered Lists • list in which the elements must always be ordered in a particular way • Also called as Sorted list. Eg. (SUNDAY ,MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY) Representation of arrays One dimensional array A one-dimensional array is also called a single dimensional array where the elements will be accessed in sequential order. This type of array will be accessed by the subscript of either a column or row index. eg a[n] or an Two dimensional array  When the number of dimensions specified is more than one, then it is called as a multi- dimensional array. Eg a[3,3] (row x column) 19-06-2023 Data Structures
  • 18. Eg a[3][4] • A two-dimensional array will be accessed by using the subscript of row and column index. eg a[1][1] 19-06-2023 Data Structures
  • 19.  Three dimensional array  In a three-dimensional array, there will be three dimensions. For eg.a[2][3][4] #include <stdio.h> int main() { int one_dim [10]; # declaration of 1D array int two_dim [2][2]; #declaration of 2D array int three_dim [2][3][4] = { { {3, 4, 2, 3}, {0, -3, 9, 11}, {23, 12, 23, 2} }, { {13, 4, 56, 3}, {5, 9, 3, 5}, {3, 1, 4, 9} }; return 0; } 19-06-2023 Data Structures
  • 20. Sparse Matrices • Triplet/Array representation • Linked List representation • Transpose 19-06-2023 Data Structures
  • 21. Sparse Matrices • a matrix will be a sparse matrix if most of the elements of it is 0 (or) • 1/3 of the matrix are non zero elements(30%) • It will take larger space in memory with no purpose. • To avoid wastage of space the sparse matrix is stored in a table structure 19-06-2023 Data Structures Row Column Value 1 4 12 1 6 -14 2 2 7 2 3 3 3 4 -8 5 1 91 6 3 25 Triplets 6x6=36 7x3=21 17 memory locations are saved Triplet Representation of Sparse matrix
  • 22. Linked list representation • The complexity of inserting or deleting a node in a linked list is lesser than the array • The four fields of the linked list are given as follows - Row - It represents the index of the row where the non-zero element is located. Column - It represents the index of the column where the non-zero element is located. Value - It is the value of the non-zero element that is located at the index (row, column). Next node - It stores the address of the next node. 19-06-2023 Data Structures
  • 23. • For eg. • linked list representation of the above matrix 19-06-2023 Data Structures
  • 24. Transpose • Interchanging row and column 19-06-2023 Data Structures Row Column Value 0 2 1 1 0 3 2 1 4 3 1 6 Triplet T= Row Column Value 2 0 1 0 1 3 1 2 4 1 3 6
  • 25. Benefits of using the sparse matrix • Storage and • Computing time 19-06-2023 Data Structures
  • 26. Stacks and Queues Stacks • Abstract Data Type (ADT) • stack allows operations(insertion or deletion) at one end only called TOP. • Insertion and Deletion of an element is done by 2 operations • PUSH (store) • POP(accessing) • At any given time, accessing the top element of a stack • element which is placed (inserted or added) last, is accessed first so it is also called as LIFO(LAST IN FIRST OUT) • The stack is called empty or null when the elements =0 • S=(a1,a2,a3,…….,an) where a1 is the bottom most element and an is the top most element 19-06-2023 Data Structures
  • 27. • Status of stack can be known through the below operations peek() − get the top data element of the stack, without removing it.  isEmpty() − check if stack is empty. isFull() − check if stack is full. 19-06-2023 Data Structures
  • 28. • Push Operation • The process of putting a new data element onto stack is known as a Push Operation. Push operation involves a series of steps − I. Step 1 − Checks if the stack is full. II. Step 2 − If the stack is full, produces an error and exit. III. Step 3 − If the stack is not full, increments top to point next empty space. IV. Step 4 − Adds data element to the stack location, where top is pointing. V. Step 5 − Returns success. 19-06-2023 Data Structures
  • 29. Pop Operation • Accessing the content while removing it from the stack • The data element is not actually removed, instead top is decremented to a lower position in the stack to point to the next value. • Deallocates memory space. • A Pop operation may involve the following steps − I. Step 1 − Checks if the stack is empty. II. Step 2 − If the stack is empty, produces an error and exit. III. Step 3 − If the stack is not empty, accesses the data element at which top is pointing. IV. Step 4 − Decreases the value of top by 1. V. Step 5 − Returns success. 19-06-2023 Data Structures
  • 30. 19-06-2023 Data Structures structure STACK (item) 1 declare CREATE ( )-> stack 2 ADD (item, stack) -> stack 3 DELETE (stack) -> stack 4 TOP (stack) -> item 5 ISEMTS (stack) -> boolean;
  • 31. Queues • Similar to stacks • a queue has two ends and it is open at both of its ends • Insertions (enqueue/rear) are made at one end and deletions(dequeue /front) are made at the other end • For eg Q= {a1,a2,…..,an} rear • First-In-First-Out methodology, i.e., the data item stored first will be accessed first. 19-06-2023 Data Structures Front
  • 32. • Scheduling of jobs in among computer applications • The basic operations associated with queues − enqueue() − add (store) an item to the queue. dequeue() − remove (access) an item from the queue. Enqueue Operation (Insertion/Rear) • Queues maintain two data pointers, front and rear. Therefore, its operations are comparatively difficult to implement than that of stacks. • The following steps should be taken to enqueue (insert) data into a queue − I. Step 1 − Check if the queue is full. II. Step 2 − If the queue is full, produce overflow error and exit. III. Step 3 − If the queue is not full, increment rear pointer to point the next empty space. IV. Step 4 − Add data element to the queue location, where the rear is pointing. V. Step 5 − return success. 19-06-2023 Data Structures
  • 34. Dequeue Operation(Deletion/Front) • Accessing data from the queue is a process of two tasks − access the data where front is pointing and remove the data after access. • The following steps are taken to perform dequeue operation − I. Step 1 − Check if the queue is empty. II. Step 2 − If the queue is empty, produce underflow error and exit. III. Step 3 − If the queue is not empty, access the data where front is pointing. IV. Step 4 − Increment front pointer to point to the next available data element. V. Step 5 − Return success. 19-06-2023 Data Structures
  • 36. • Few more functions are peek() − Gets the element at the front of the queue without removing it. isfull() − Checks if the queue is full. isempty() − Checks if the queue is empty. • peek() -This function helps to see the data at the front of the queue. 19-06-2023 Data Structures
  • 37. • isfull() -check for the rear pointer to reach at MAXSIZE to determine that the queue is full • isempty()- If the value of front is less than MIN or 0, it tells that the queue is not yet initialized, hence empty. 19-06-2023 Data Structures
  • 40. Multiple stacks and Queues • A single stack is sometimes not sufficient to store a large amount of data. • To overcome this problem, multiple stack solves the problem. • A single array having more than one stack. The array is divided for multiple stacks. • m memory is divided in to n number of stacks sharing equal memory. • If size of stack is known then the m memory can divided in to known number of stacks 19-06-2023 Data Structures
  • 41. 19-06-2023 Data Structures T[i] B[i] B[i]=T[i] # if ith stack is empty/underflow B[i]=T[i+1] # ith stack is full/overflow
  • 42. Evaluation of Expressions Expression - An expression is a collection of operators and operands that represents a specific value. For eg • operator is a symbol which performs a particular task like arithmetic operation or logical operation or conditional operation etc., Operands are the values on which the operators can perform the task. Here operand can be a direct value or variable or address of memory location. 19-06-2023 Data Structures
  • 43. • Three different types of Expressions based on the operator position are Infix Expression-operator placed between the operands eg.a+b Postfix Expression- operator is used after operands eg ab+ Prefix Expression- operator is used before operands eg. +ab • convert an expression from one form to another form like Infix to Postfix, Infix to Prefix, Prefix to Postfix and vice versa. • Converting any Infix expression into Postfix or Prefix expression Find all the operators in the given Infix Expression. Find the order of operators evaluated according to their Operator precedence. Convert each operator into required type of expression (Postfix or Prefix) in the same order 19-06-2023 Data Structures
  • 44. Steps to convert Infix Expression to Postfix Expression... D = A + B * C Step 1 - The Operators in the given Infix Expression : = , + , * Step 2 - The Order of Operators according to their preference : * , + , = Step 3 - Now, convert the first operator * ----- D = A + B C * Step 4 - Convert the next operator + ----- D = A BC* + Step 5 - Convert the next operator = ----- D ABC*+ = 19-06-2023 Data Structures Operator Priority **,unary-,unary+,¬ 7 ^(exponentiation) 6 *,/ 5 +,- 4 <,>,=,≠,≤,≥, 3 and 2 or 1
  • 46. Unit -2 Linked List • Linked List: Singly Linked List • Linked Stacks and Queues • Polynomial Addition • More on Linked Lists • Sparse Matrices • Doubly Linked List and Dynamic • Storage Management • Garbage Collection and Compaction. 19-06-2023 Data Structures
  • 47. Linked Lists • A linked list is a linear data structure, in which the elements are not stored at contiguous memory locations. • The elements in a linked list are linked using pointers. • A linked list consists of nodes where each node contains a data field and a reference(link) to the next node in the list. • Address of the first/starting node is identified head and last node is identified as NULL . • A linked list can grow and shrink its size, as per the requirement. • It does not waste memory space. 19-06-2023 Data Structures Node
  • 48. • Different types of Linked lists are Singly linked list-Item navigation is forward only. 19-06-2023 Data Structures
  • 49. Doubly linked list-Items can be navigated forward and backward 19-06-2023 Data Structures
  • 50. Circular linked list-Last item contains link of the first element as next and the first element has a link to the last element as previous. 19-06-2023 Data Structures
  • 51. • Basic Operations of LL are Insert − Adds an node to the list. Display − Displays the complete list. Search − Searches an element using the given key. Delete − Deletes an element using the given key. • Insert- Adding a new node in linked list 19-06-2023 Data Structures NewNode.next −> RightNode; LeftNode.next −> NewNode
  • 54. 19-06-2023 Data Structures GAT 1.Get a node which is currently unused and address it as X 2.Set the DATA field of this node to GAT 3.Set the LINK field of X to point to the node after FAT which contains HAT 4.Set the LINK field of the node containing FAT to X
  • 55. Deletion- • locate the target node to be removed, by using searching algorithms. 19-06-2023 Data Structures TargetNode.next −> NULL;
  • 56. • Either it deletes the node from the linkedlist or deallocate its memory and wipe off completely. • Suppose to delete the node GAT from the list 19-06-2023 Data Structures 1
  • 57. • Dividing memory into nodes each having at least one link field. • A mechanism to determine the nodes which are free and in use • A mechanism to transfer nodes from the reserved pool to the free pool and vice versa Storage pool • Contains all nodes that are not currently being used. • RET(to the pool) and GETNODE(from the pool) procedures • If the node is no longer needed it is erased from the pool. • Initially link all of the available nodes together in a single list-AV • Singly linked list where available nodes are linked. 19-06-2023 Data Structures
  • 61. Example 1. Assume that each node has two fields DATA and LINK. The following algorithm creates a linked list with two nodes whose DATA fields are set to be the values 'MAT' and 'PAT' respectively. T is a pointer to the first node in this list. 19-06-2023 Data Structures
  • 62. Eg 2-Let T be a pointer to a linked list. T= 0 if the list has no nodes. Let X be a pointer to some arbitrary node in the list T. The following algorithm inserts a node with DATA field 'OAT' following the node pointed at by X. 19-06-2023 Data Structures
  • 63. Eg 3-Let X be a pointer to some node in a linked list T . Let Y be the node preceding X. Y = 0 if X is the first node in T (i.e., if X = T). The following algorithm deletes node X from T. 19-06-2023 Data Structures
  • 64. Array vs Linkedlist Array Linked list An array is a collection of elements of a similar data type. A linked list is a collection of objects known as a node where node consists of two parts, i.e., data and address. Array elements store in a contiguous memory location. Linked list elements can be stored anywhere in the memory or randomly stored. Array works with a static memory. Here static memory means that the memory size is fixed and cannot be changed at the run time. The Linked list works with dynamic memory. Here, dynamic memory means that the memory size can be changed at the run time according to our requirements. Array elements are independent of each other. Linked list elements are dependent on each other. As each node contains the address of the next node so to access the next node, we need to access its previous node. Array takes more time while performing any operation like insertion, deletion, etc. Linked list takes less time while performing any operation like insertion, deletion, etc. Accessing any element in an array is faster as the element in an array can be directly accessed through the index. Accessing an element in a linked list is slower as it starts traversing from the first element of the linked list. In the case of an array, memory is allocated at compile-time. In the case of a linked list, memory is allocated at run time. Memory utilization is inefficient in the array. For example, if the size of the array is 6, and array consists of 3 elements only then the rest of the space will be unused. Memory utilization is efficient in the case of a linked list as the memory can be allocated or deallocated at the run time according to our requirement. 19-06-2023 Data Structures
  • 65. Polynomial addition • polynomials are the expressions that contain the number of terms with non- zero exponents and coefficients. • Consider the following General Represent of Polynomial. • Linked representation of polynomials, each term considered as a node, therefore these node contains three fields. • Coefficient Field – The coefficient field holds the value of the coefficient of a term • Exponent Field – The Exponent field contains the exponent value of the term • Link Field – The linked field contains the address of the next term in the polynomial 19-06-2023 Data Structures
  • 66. • let us consider P and Q be two polynomials having these two polynomials three terms each. A=3𝑥14+2𝑥8+1 B=8𝑥14-3𝑥10+10𝑥6 • The two plynomials are represented in the form of linked list below A=3𝑥14+2𝑥8+1 B=8𝑥14-3𝑥10+10𝑥6 19-06-2023 Data Structures
  • 67. • The following algorithm computes time and cost for the below operations • Coefficient additions • Coefficient comparisons • Additions/deletions on available space • Creating new node for C 19-06-2023 Data Structures
  • 69. • ATTACH procedure creates a new node with C(coefficient),E(exponent),d (current last node) • Whenever new node is generated with C ,E it is appended to the end of the list C 19-06-2023 Data Structures
  • 72. • The use of linked lists is well suited for all polynomial operations like addition,subtraction,multiplication by writing procedures collecting input, and displaying output. • For eg D(x)=A(x)*B(x)+C(x) Can be written as 19-06-2023 Data Structures
  • 73. • To compute more polynomial operations the nodes T(x) are reclaimed to hold other polynomials for the future use. 19-06-2023 Data Structures
  • 74. • RET procedure is avoided by using ERASE procedure • The time take to erase T(x) proportional to the number of nodes in T. • Another efficient way to erase the nodes is by modifying the list structure (link field of the last node points back to the first node ) • Circular list erases the nodes in fixed amount of time independent of the number of nodes in the list. 19-06-2023 Data Structures
  • 76. • Zero/Non zero polynomials are handled in a special case • One special node is added for handling zero polynomials • A=3x14+2x8+1 19-06-2023 Data Structures
  • 78. • Invert linked list • https://www.youtube.com/watch?v=sYcOK51hl-A • https://www.youtube.com/watch?v=D7y_hoT_YZI 19-06-2023 Data Structures
  • 79. CONCATENATE Procedure • Concatenates subroutines two chains X and Y .It is linear . • Concatenation means joining two linked lists or appending one linked list to another linked list and generate a combined linked list. • Time Complexity of Concatenate procedure is O(n). 19-06-2023 Data Structures
  • 82. INSERT_FRONT procedure • Inserts a node at the front or rear of a circular list and take a fixed amount of time. 19-06-2023 Data Structures
  • 83. LENGTH Procedure • To find a length of a list 19-06-2023 Data Structures
  • 85. SPARSE MATRIX linked list representation- • Each column of a sparse matrix will be represented by a circularly linked list with a HEAD node. • Each row will also be a circularly linked list with a head node . • Each node in the structure other than a head node will represent a non zero term in the matrix A. • Linked list representation of Sparse matrix has 5 fields. 19-06-2023 Data Structures Down-links to the next non zero element in the same column Right-links to the next non zero element in the same row.
  • 86. • a ij will be linked into the circular linked list for row i and circular linked list of column j. • So aij be a member of two lists at the same time. • Every row and column has head nodes and it is set to zero. • For every non zero term of Matrix A ,one 5 field node is given. 19-06-2023 Data Structures
  • 88. • MREAD and MERASE procedure is used to read and erase the elements of the sparse matrix linked list representation. 19-06-2023 Data Structures
  • 90. Doubly linked list- • A node in a DLL has 3 fields DATA,LLINK,RLINK • May or may not be circular • DATA field of the head node will not contain information. 19-06-2023 Data Structures
  • 91. • If P node points to any node in the doubly linked list 19-06-2023 Data Structures
  • 93. Dynamic storage Management- • In a multiprocessor system several programs reside in memory at the same time. • Different programs have different memory requirements. • When OS requests for memory in dynamic environment memory size is not known ahead of time. • After the execution of the program the memory is freed is some order different from allocation. • At the start of the computer system whole memory with no jobs are available for allocation. • Then jobs are submitted to the computer and requests for memory allocation. 19-06-2023 Data Structures
  • 94. • For eg start with 1,00,000 words of memory and 5 programs • Unshaded area indicates memory that is not currently in use. • Assume P2 and P4 complete execution freeing the memory used by them. 19-06-2023 Data Structures Memory Programs 10,000 P1 15,000 P2 6,000 P3 8,000 P4 20,000 P5 41,000
  • 95. • OS has to maintain a list of all blocks of storage currently not in use and then to allocate storage from this unused pool as required . • Chain structure is adopted to maintain the available space list. • Linking all the free blocks together retaining the memory size of the block. • Each node on the free list has 2 fields in its first word SIZE and LINK. 19-06-2023 Data Structures Memory Programs 10,000 P1 15,000 P2 6,000 P3 8,000 P4 20,000 P5
  • 96. • During requisition for the memory of storing N words in the list of free blocks finding or searching the necessary free block is done by allocation strategy. • Allocation strategy is of two types First fit Best fit • If the memory block size ≥ N and allocating N words out this block-First fit • If the memory whose size is as close to N as possible and not less than N-Best Fit 19-06-2023 Data Structures
  • 97. 19-06-2023 Data Structures n- memory size needed p- address where n can be allocated AV-available space list
  • 98. • Allocation for a portion of memory in a free block is made from the bottom of the block to avoid changing links in the available list. • The blocks in the available list is maintained as a circular linked list with head node set to 0. • Allocation and freeing of nodes is made here . • Freeing nodes or returning nodes to AV and recognize if its neighbours are also free so that they can be coalesced in to single block. 19-06-2023 Data Structures Memory Programs 10,000 P1 15,000 P2 6,000 P3 8,000 P4 20,000 P5
  • 99. 19-06-2023 Data Structures If P3 is the next program to terminate rather than adding it to the free list ,it is better to combine the adjacent free blocks corresponding to P2 and P4 Memory Program s 10,000 P1 15,000 P2 6,000 P3 8,000 P4 20,000 P5
  • 100. • When are free blocks are combined together available block sizes get smaller and smaller. • To determine free adjacent memory blocks without searching the available list ,a node structure is adopted for allocated and free nodes. 19-06-2023 Data Structures
  • 101. • Assume memory of size 5000 from which the following allocations are made 19-06-2023 Data Structures Resource size R1 300 R2 600 R3 900 R4 700 R5 1500 R6 1000 Memory Configuration- Different blocks of storage and the available space list-
  • 102. • When a portion of free block is allocated ,allocation is made from the bottom of the block. • When r1 is freed 19-06-2023 Data Structures
  • 103. • When r4 is freed • When r3 is freed 19-06-2023 Data Structures
  • 105. Garbage Collection and Compaction • The process of collecting all unused nodes and returning them to available space. • Carried out in two phases • First phase-marking phase-all nodes in use are marked. • Second Phase-all unmarked nodes are returned to available space list. It is trivial when all nodes are fixed size. Examining every unmarked nodes to check whether it is marked or unmarked. Take O(n) steps. free nodes form a contiguous block of memory called memory compaction • Each node contains Mark bit and it can be changed at any time by using marking algorithm • Marking algorithm marks all direct and indirect accessible nodes . • Initially all the nodes are set to zero. 19-06-2023 Data Structures
  • 106. • Each node will have MARK and TAG field . • The node with MARK field as 1 contains DLINK And RLINK. • The TAG bit will be zero it contains atomic Information and are called atomic nodes. • Other nodes which contains 1 bit are called list Nodes. • Marking algorithms is used to mark the nodes • Initially all the nodes are unmarked MARK(i)=0 for all nodes i • Driver for marking algorithm is called to mark the nodes accessible from the pointer variables . 19-06-2023 Data Structures
  • 109. Storage Compaction • When storage requests may be for blocks of varying sizes ,compact storage so that the free storage forms one contiguous block. • Nodes in use have MARK bit =1 and free have MARK bit=0 • Nodes are labelled 1 to 8. • Free nodes can be linked together to obtain the available space.moving current in use nodes to the one end and free nodes are moved to the other 19-06-2023 Data Structures
  • 110. • By relocating the storage of nodes forms two contiguous block one is for used and another one is free • • Storage compaction should update the links to point to the relocated address of the respective node . 19-06-2023 Data Structures
  • 111. • With storage compaction three tasks are identified: • Determine new addresses for nodes in use • Update all links in nodes in use • Relocate nodes to new addresses 19-06-2023 Data Structures
  • 112. • Each node has size ,NEW_ADDR,LINK1 and LINK2 19-06-2023 Data Structures
  • 114. Trees • Basic Terminology • Binary Trees • Binary Tree Representations • Binary Trees Traversal • More on Binary Trees • Threaded Binary Trees • Representation of Binary Trees • Counting Binary Trees 19-06-2023 Data Structures
  • 115. Trees • A tree is a non linear data structure means that the data is organized so that items of information are related by branches. • It is easier and quick to access • Data is organised in the form of trees with root node, branches and leaf nodes • Also called as genealogies. There are two different types of genealogical charts • Pedigree chart( tree of organisms or genes) • lineal chart( tree of languages) 19-06-2023 Data Structures
  • 117. Recursive definition of tree-A tree consists of a root, and zero or more subtrees T1, T2, … , Tk such that there is an edge from the root of the tree to the root of each subtree. 19-06-2023 Data Structures
  • 118. • A node stands for the item of information plus the branches to other items. • The number of subtrees of a node is called its degree. • Nodes that have degree zero are called leaf or terminal nodes. • The other nodes which has degree is called non terminal nodes. • Trees nodes can also be referred as parent and child nodes. • c 19-06-2023 Data Structures • Children of the same parent are called siblings • The degree of a tree is the maximum degree of the nodes in the tree. • The ancestors of a node are all the nodes along the path from the root to that node. • The level of the node letting the root be at the level one. • The height or depth of the tree depends on the maximum level of any node in the tree
  • 119. • A forest is a set of n≥0 disjoint trees . • A tree is called a forest when the root of the tree is removed • We have 3 trees if node A is removed 19-06-2023 Data Structures
  • 120. • Another useful way to draw a tree is using list • The example of the tree can be written in the list form as • The node structure of tree when represented in the form of linked list 19-06-2023 Data Structures
  • 123. 19-06-2023 Data Structures Tree Binary tree General tree is a tree in which each node can have many children or nodes. Whereas in binary tree, each node can have at most two nodes. The subtree of a general tree do not hold the ordered property. While the subtree of binary tree hold the ordered property. In data structure, a general tree can not be empty. While it can be empty. In general tree, a node can have at most n(number of child nodes) nodes. While in binary tree, a node can have at most 2(number of child nodes) nodes. In general tree, there is no limitation on the degree of a node. While in binary tree, there is limitation on the degree of a node because the nodes in a binary tree can’t have more than two child node. In general tree, there is either zero subtree or many subtree. While in binary tree, there are mainly two subtree: Left- subtree and Right-subtree
  • 125. • S Skewed Binary tree Complete Binary Tree • Degree ,level , height ,leaf ,parent , and child are also applied here. • https://www.javatpoint.com/discrete-mathematics-binary-trees • https://www.geeksforgeeks.org/introduction-to-binary-tree-data- structure-and-algorithm-tutorials/ 19-06-2023 Data Structures
  • 126. Binary tree Representation • Binary tree is represented in the form of its depth k have 2k-1 nodes • Sequential representation of binary tree is represented from sequentially numbering the nodes starting from the node 1 in the level 1 • Nodes on any level are numbered from left to right • A binary tree with n nodes and depth is complete if the nodes corresponds to the node which are numbered one to n in the full binary tree of depth k. 19-06-2023 Data Structures
  • 128. • Array representation of sequential tree does not waste space. • Insertion or deletion of a node in the middle of tree requires movement of many nodes to reflect the change of level number of these nodes. • It can be overcome easily by using linked list representation 19-06-2023 Data Structures
  • 130. • It is difficult to determine the parent node • So a fourth field is included to identify PARENT node Binary Tree Traversal • Many operations can be performed on trees. • Traversing a tree or visiting each node at least once. • Full traversal of a tree produces a linear order for the information in a tree. • While traversal every node is treated in the same manner 19-06-2023 Data Structures
  • 131. • Six possible combinations of traversal are LDR LRD DLR DRL RDL RLD • Traversal from left have 3 traversals like LDR LRD DLR • These traversals are called Inorder Postorder Preorder https://www.youtube.com/watch?v=WLvU5EQVZqY 19-06-2023 Data Structures
  • 132. 19-06-2023 Data Structures Inorder –moving down the tree towards the left until no nodes left then visit the next node on the right and move on
  • 135. COPY of a binary tree • Producing an exact copy or clone or duplicate of a given binary tree • Modification of post order traversal gives the copy of the binary tree 19-06-2023 Data Structures
  • 136. EQUAL of a binary tree(identical/same) • Binary trees are equivalent if they have the same topology and the information in corresponding nodes is identical • By the same topology every branch in one tree corresponds to a branch in the second in the same order • EQUAL traverses the binary trees in preorder 19-06-2023 Data Structures
  • 137. Algorithm to check binary trees are identical • Check both nodes of both tree1 and tree2 • If tree1 and tree2 is null, tree traversal completed successfully. • return true • If node of any of tree is null. • Trees are not identical, return false . • Compare data of tree1 and tree2 • Data is same for both nodes • Go through Left subtree and right subtree • Traverse Left child of binary tree1 and left child of tree2 • Traverse Right child of binary tree1 and right child of tree2 • Data is not same • Trees are not identical, return false • After above traversal, we will know whether binary trees are identical or equal or same. • Time Complexity: • Let tree1 contains p number of nodes & tree2 contains q number of nodes. • Time Complexity: O(p) where p > q 19-06-2023 Data Structures
  • 138. Example 1: Identical or Same binary trees • Structure of both the trees is same • Data nodes of corresponding binary trees are same. 19-06-2023 Data Structures
  • 139. Example 2: Non-Identical binary trees • Structure of both the trees is same • Data nodes of corresponding binary trees are NOT same. • Node C and Node R has different values. • Node D and Node S has different values. 19-06-2023 Data Structures
  • 141. Propositional logic of a binary tree • Propositional formula contains variables x1,x2,x3…. • And operators • The variable with these operators are called expressions which have only 2 possible values either TRUE or FALSE • the expression with operators is called propositional calculus • For eg • Can be read as 19-06-2023 Data Structures
  • 142. • If x1 and x3 are false and x2 is true then the value of the above expression is • For eg 19-06-2023 Data Structures
  • 143. Threaded Binary Tree(TBT) • Linked list representation of binary tree contains more null links than actual pointers • Like n+1 null links and 2n total links • TBT is a technique to make use of null links in a clever way founded by A.J.Perlis and C.Thornton • Their idea was to replace the null links by pointers called threads to other nodes. • Rules to be followed for a thread binary tree • Left most and right most node in the binary should be “NULL” • Change all other null pointers to • Left pointer-inorder predecessor • Right pointer-inorder successor 19-06-2023 Data Structures
  • 144. 19-06-2023 Data Structures H,d,i,b,e,a,f,c,g-inorder traversal • Tree has 9 nodes and 10 NULL links • These NULL links must be replaced by threads Left pointer-inorder predecessor Right pointer-inorder successor
  • 146. • In memory representation normal pointers and threads must be differentiated • So it can be done by mentioning the address as either parent or child • (child(1) and parent node(0)) • Differentiated by using two extra one bit fields bits LBIT and RBIT • Node structure of a linked binary tree with LBIT and RBIT is • If the left pointer points to the child node LBIT will be 1 and 0 if it points the parent or ancestor node • If Right pointer points to child RBIT will be 1 and 0 if it points to ancestors • https://www.youtube.com/watch?v=ffgg_zmbaxw 19-06-2023 Data Structures Left pointer LBIT Data RBIT Right pointer
  • 147. 19-06-2023 Data Structures Left pointer LBIT Data RBIT Right pointer NULL NULL
  • 148. • Introducing a new dummy node and the NULL left pointer of H node points to the left pointer of dummy node and right pointer of dummy node points itself • To maintain consistency of the TBT. 19-06-2023 Data Structures
  • 149. • The computing time is O(n) for n nodes. • Same can be applied to pre and postorder traversal • Insertion is possible in threaded binary tree • Procedure to grow a threaded tree • If the node has an empty sub tree it is easy to insert another node otherwise right subtree is made to right subtree of already available node. 19-06-2023 Data Structures
  • 151. Binary tree representation of trees • Every tree can be represented as binary tree • Array representation • Linkedlist representation • Relationship representation 19-06-2023 Data Structures
  • 154. Relationship representation- • Relationship between the nodes are characterized by two quantities • Leftmost-child-next-right-sibling relationship • Every node has at most leftmost child and one next right sibling • Left most child of B is E and next right sibling Of B is C 19-06-2023 Data Structures
  • 155. 19-06-2023 Data Structures • Connecting together all siblings of a node • Deleting all links from a node to its children except the link of its left most child
  • 157. 19-06-2023 Data Structures Tree can represented in the formal way as Preorder,inorder and post order traversal of the binary tree can also be applied here Preorder-
  • 158. Inorder traversal of T Post-order traversal of T https://prod-edxapp.edx- cdn.org/assets/courseware/v1/0f0865e1fe974ec8b2244cdcd7f5d68a/c4x/Pe kingX/04830050x/asset/chapter6_001_en.pdf 19-06-2023 Data Structures
  • 159. Counting Binary Trees- • Determining distinct binary trees with n nodes • When n=0 and n=1 there is only one binary tree • When n=2 ,two distinct binary trees • When n=3, five distinct binary trees 19-06-2023 Data Structures
  • 162. • Pre order • In order 19-06-2023 Data Structures
  • 164. GRAPHS • A graph consists of two sets V and E • Vertices (V)-units of a graph • Edges(E)-connection of units • A Graph is a non-linear data structure consisting of vertices and edges. • V is a finite non empty set of vertices or units of the graph • E is a set of pairs of vertices called edges • V(G) and E(G) represents the vertices and Edges of Graph G • A graph is represented as G=(V,E) 19-06-2023 Data Structures
  • 165. • A graph is of two types Directed graph Un directed graph 19-06-2023 Data Structures
  • 166. • Multigraph-A graph is said to be a multigraph if the graph doesn't consist of any self-loops, but parallel edges are present in the graph. If there is more than one edge present between two vertices, then that pair of vertices is said to be having parallel edges. 19-06-2023 Data Structures
  • 167. • Complete Graph- A graph is said to be a complete graph if, for all the vertices of the graph, there exists an edge between every pair of the vertices. 19-06-2023 Data Structures
  • 168. • Adjacent-Two node or vertices are adjacent if they are connected to each other through an edge. The adjacent vertices to vertex 2 are 4,5, and 1 19-06-2023 Data Structures
  • 169. • Subgraph-A graph in data structure is said to be a subgraph if it is a part of another graph. 19-06-2023 Data Structures
  • 171. • Length- the length of a path is the number of edges on it . • Simple Path-A path that does not repeat vertices is called a simple path. • Cycle-is a simple path in which the first and last vertices are the same. 19-06-2023 Data Structures
  • 172. • In degree-In-degree of a vertex is the number of edges coming to the vertex. • Out degree -Out-degree of a vertex is the number edges which are coming out from the vertex. 19-06-2023 Data Structures
  • 173. Graph Representation- • There are three representations of graphs Adjacency Matrix Adjacency List Adjacency multilists Adjacency Matrix- • Let g=(V,E) be a graph with n vertices,n>=1 • The adjacency matrix of G is a 2 dimensional n x n array say A, with the property that A(i,j)=1 if the edge (vi,vj)is in E(G). • A(i,j)=0 if there is no edge in G 19-06-2023 Data Structures
  • 174. • The adjacency matrix for graphs G1,G3 and G4 are given below 19-06-2023 Data Structures
  • 175. 19-06-2023 Data Structures The adjacency matrix will require atleast O(n2) time to examine all the diagonals
  • 176. Adjacency Lists- • N rows of adjacency matrix is represented as n linked lists. • There is one list for each vertex in G • Each node has atleast 2 fields • VERTEX-contains the indices of the vertices adjacent to vertex i. • LINK • Each list has a head node • The head nodes are sequential providing easy random access to the list for any vertex 19-06-2023 Data Structures
  • 179. • Adjacency list requires n head nodes and 2e list nodes • In terms of number of bits of storage needed this count should be multiplied by log n for the head nodes and log n +log e for the list nodes • It takes O(logm) bits to represent the number of value m. • Sparse matrix representation of graph has 4 fields 19-06-2023 Data Structures
  • 181. Adjacency multilists- • are an edge, rather than vertex based, graph representation. • In the Multilist representation of graph structures consists of two parts  a directory of Node information and a set of linked list of edge information. • For each edge there will be an exactly one node,but this node will be in two lists 19-06-2023 Data Structures m- one bit mark field to indicate that edge is examined or not V1-start vertex of edge (v1,v2)=v1 V2-start vertex of edge (v1,v2)=v2 List1-first down “list name” where v1 is present List 2-First down “list name “ where v2 is present .
  • 183. Traversals, Connected Components ,and Spanning Trees • Given an undirected graph g=(V,E) and a vertex v in V(g) • Visiting all the vertices in G that are reachable from V • Two ways to visit • Depth first search(DFS) • Breadth First Search(BFS) 19-06-2023 Data Structures
  • 184. Depth First Search (DFS) Traversal /Algorithm- • The start vertex v is visited • Next an unvisited vertex w adjacent to v is selected • A depth first search from w is initiated • When a vertex u is reached such that all its adjacent vertices have been visited . • The search is terminated when no unvisited vertex can be reached from any of the visited nodes • The DFS algorithm is a recursive algorithm that uses the idea of backtracking. • https://www.youtube.com/watch?v=iaBEKo5sM7w 19-06-2023 Data Structures
  • 185. • This recursive nature of DFS can be implemented using stacks. • The basic idea is as follows:  Pick a starting node and push all its adjacent nodes into a stack.  Pop a node from stack to select the next node to visit and push all its adjacent nodes into a stack.  Repeat this process until the stack is empty. However, ensure that the nodes that are visited are marked. This will prevent you from visiting the same node more than once. If you do not mark the nodes that are visited and you visit the same node more than once, you may end up in an infinite loop. 19-06-2023 Data Structures
  • 187. 19-06-2023 Data Structures DFS visiting order V1 V2 V4 V8 V5 V6 V3 v7
  • 193. Breadth First Search Traversal /Algorithm- • Starting at vertex v (root node)and marking it as visited. • Traversing the graph layerwise visiting the neighbour nodes (directly connected to the root node) • Traversing towards the next level neighbour nodes in breadth wise • In BFS all nodes should be visited in layer 1 before moving to the next Layer 2 https://www.youtube.com/watch?v =QRq6p9s8NVg 19-06-2023 Data Structures
  • 196. 19-06-2023 Data Structures BFS visiting order V1 V2 V3 V4 V5 V6 V7 V8
  • 199. Connected components • Connectivity in an undirected graph means that every vertex can reach every other vertex via any path. • Strong Connectivity applies only to directed graphs. A directed graph is strongly connected if there is a directed path from any vertex to every other vertex. • If the graph is not connected the graph can be broken down into Connected Components. • This is same as connectivity in an undirected graph, the only difference being strong connectivity applies to directed graphs and there should be directed paths instead of just paths. Similar to connected components, a directed graph can be broken down into Strongly Connected Components. • To determine all the connected components of the graph • It can be obtained by making either DFS(v) or BFS(v) calls repeatedly 19-06-2023 Data Structures
  • 203. Spanning tree and Minimum Cost Spanning Trees • A graph which contains all vertices with minimum number of edges • If any vertex is missed it is not a spanning tree • A spanning tree contains n-1 edges where n is the number of vertices • Edges of the vertices may or may not have weights assigned to them • All the possible spanning trees have same number of vertices but the number of edges would be n-1. 19-06-2023 Data Structures n=4 e=n-1=4-1=3
  • 204. • Cycle should not formed while designing a spanning tree • When BFS is used the resulting tree is called BFS spanning tree and when DFS is used the resulting tree is called DFS spanning tree. 19-06-2023 Data Structures
  • 205. 19-06-2023 Data Structures Application of Spanning Tree • Spanning tree is basically used to find a minimum path to connect all nodes in a graph. Common application of spanning trees are − Civil Network Planning Computer Network Routing Protocol Cluster Analysis
  • 206. Minimum Spanning Tree – • The cost of a spanning tree is the sum of the costs of the edges in that tree • One approach to find out the minimum cost spanning tree by Krushal. • In this approach minimum cost spanning tree T is built edge by edge • Edges are considered for inclusion in T if t is in non decreasing order of their costs. • Loops and parallel edges are removed • An edge is included in T if it does not form a cycle with the edges already in T • Since G is connected and has n>0 vertices exactly n-1 edges will be selected for inclusion in T • Time complexity of minimum cost spanning tree is O(e log e) where e is the number of edges in E. 19-06-2023 Data Structures
  • 213. Shortest Path • The length of the path is defined to be the sum of the weights of the edges on that path rather than the number of edges. • The starting vertex of the path will be referred to as source and the last vertex is called as destination • The graphs will be digraphs and weights assigned are positive Single Source All destinations • Given a directed graph G=(V,E) ,a weighing function w(e) for the edges of G and the source vertex V0. • Finding the shortest paths from V0 to all the remaining vertices of G 19-06-2023 Data Structures
  • 215. • Shortest path algorithm first given by Dijkstra to determine the shortest paths from v0 to all other vertices in G • Number of vertices starts from 1 through n • The Set S is maintained as a bit array with S(i)=0 if vertex I is not in S and S(i)=1 if it is • The graph is represented by its cost adjacency matrix with COST(i,j)being the weight of the edge (i,j) • DIST(i) 19-06-2023 Data Structures
  • 216. Basics of Dijkstra's Algorithm • Dijkstra's Algorithm basically starts at the node that you choose (the source node) and it analyzes the graph to find the shortest path between that node and all the other nodes in the graph. • The algorithm keeps track of the currently known shortest distance from each node to the source node and it updates these values if it finds a shorter path. • Once the algorithm has found the shortest path between the source node and another node, that node is marked as "visited" and added to the path. • The process continues until all the nodes in the graph have been added to the path. This way, we have a path that connects the source node to all other nodes following the shortest path possible to reach each node. 19-06-2023 Data Structures
  • 219. Transitive Closure • Determining the existence of the path between every pair of vertices • Given a directed graph, find out if a vertex j is reachable from another vertex i for all vertex pairs (i, j) in the given graph. • Reachable mean that there is a path from vertex i to j. The reachability matrix is called the transitive closure of a graph. 19-06-2023 Data Structures
  • 221. Unit4 External Sorting • Storage Devices • Sorting with disks • Sorting with Tapes • Symbol Tables • Static tree tables • Dynamic Tree tables • Hash tables 19-06-2023 Data Structures
  • 222. • Techniques to sort large files • The files are large to accumulate in internal memory of a computer • Characteristics of external storage devices • External storage devices are broadly categorized • Sequential access(tapes) • Direct access (drums and disks) 19-06-2023 Data Structures
  • 224. Storage Devices – Magnetic Tapes • Used for Computer input /output • Data is recorded on magnetic tape approximately ½” wide • The tape is wound around a spool • A new reel of tape is normally 2400 ft long • Tracks run across the length of the tape with a tape having typically 7 to 9 tracks across its width • Depending on the direction of magnetization , a spot on the track can represent either as 0 or 1 • Combination of bits on the tracks represents a character (A-Z,0-9,etc.) 19-06-2023 Data Structures
  • 225. • The number of bits written per inch of the track is referred to as tape density • Reading from a magnetic tape or writing onto it is done from a magnetic drive. • A tape drive consist of 2 spindles • One of the spindle is mounted with source Reel and the other one take up the reel • Forward reading or writing the tape is pulled From the source reel across the read/write heads and onto the take up reel • Some tape drives also permit backward Reading and writing of tapes 19-06-2023 Data Structures
  • 226. • If characters are packed onto a tape at a density of 800dpi then a 2400ft tape would hold a little over 23x106 characters • If the tape does not have enough space for one full information it can be grouped into several blocks • These blocks may be of variable size or fixed size • In between blocks of data is an interblock gap normally about ¾ inches long • The interblock gap is long enough to permit the tape to accelerate from rest to the correct Read/write speed before the beginning of the next block reaches the read/write heads. • To read a block from a tape one specifies the length of the block and also the address A in the memory 19-06-2023 Data Structures
  • 227. • To write a block of data onto a tape the starting address and the number of consecutive words to be written in the memory • The block size will correspond to the size of the input/output buffers set up in memory • Computer tape is an example of sequential access device • If the read head is positioned at the front of the tape and one wishes to read the information ina block 2000ft down the tape then it is necessary to forward space the tape the correct number of blocks . • If to read the first block the tape would have to be rewound 2000 ft to the front before the first block could be read. • Typical rewind times over 2400ft of tape takes around 1 minute. 19-06-2023 Data Structures
  • 228. • Some assumptions about the tape drive Tapes can be written and read in the forward direction only The I/O channel of a computer permits 3 tasks to be carried out parallel –writing on to the tape, reading from another tape and CPU operation 19-06-2023 Data Structures
  • 229. Disk Storage- • Disks is a direct access storage device • Disks has two distinct component The disk module(simply the disk on which info is stored) The disk drive (corresponding to the tape drive which performs the reading or writing information onto disks) • Disks can be removed or mounted onto a disk drive • The disk pack consists of several platters that are similar to phonograph records. The number of platters per pack varies and typically is about 6. • Each platter has 2 surfaces on which information can be recorded 19-06-2023 Data Structures
  • 230. • The outer surfaces of the top and bottom surface are not used • There are total of 10 surfaces on which the information may be recorded • Disk contains the spindle on which the disk May be mounted and a set of read/write heads • There is one read/write for each surface • Every read/write the heads are held stationary over the position of the platter where the read/write to be performed • While disks itself rotates at high speeds (2000-3000 rpm) 19-06-2023 Data Structures
  • 231. • Every disk will read/write in concentric circles on each surface • The area that can be read from or written onto a single stationary head is referred as a track. • Tracks are thus concentric circles and each time the disk completes the revolution an entire track Passes a read/write head • There may be 100 to 1000 tracks on each surface of a platter • The collection of tracks simultaneously under a read/write head on the surfaces of all the platters is called a cylinder 19-06-2023 Data Structures
  • 232. • Tracks are divided into sectors • A sector is a smallest addressable segment of a track • Information is stored along the tracks of a surface in the blocks • In order to use a disk the sector number has to be specified • The read/write head assembly is positioned to right side of the cylinder. • Before start to read/write it has to wait for the right sector to come beneath the read/write head • Then transmission can take place • Three factors contributing to I/O time for disks Seek time –time taken to position the read/write heads to the correct cylinder depends on the number of cylinders across which the heads have to move Latency time-time until the right sector of the track is under the read/write head Transmission time –time taken to transmit the block of data to/from the disk 19-06-2023 Data Structures
  • 233. Sorting with disks- • The most popular method of sorting in external device is merge sort • This method have two distinct phases 1. First, divide the file into runs such that the size of a run is small enough to fit into the main memory. Next, sort each run in main memory using the standard merge sort sorting algorithm. 2. Finally, merge the resulting runs into successively bigger runs until the file is sorted. • Calculate the overall computing time • For eg 19-06-2023 Data Structures
  • 235. 1. Internally sort three blocks at a time(ie 750 records) to obtain six runs R- R6.A method such as heap sort or quick sort could be used .these six runs are written out on to the disk. 2. Set aside 3 blocks of internal memory each capable of holding 250 records. Two of these blocks will be used as input buffers and one as the output buffer. Merge R1 and R2.this is carried out by first reading one block of each of these runs into input buffers. 3. Blocks of runs are merged from the input buffers in to the output buffer 4. When the output buffer gets full it is written on to the disk. 5. If an input buffer gets empty it is refilled with another block from the same run 6. Then R3,R4 and finally R5 ,R6 are merged 19-06-2023 Data Structures
  • 237. • Analysing the time required to sort these 4500 records .the analysis will have the following notation • Seek time can be reduced by writing the blocks in the same cylinder or adjacent cylinders • Should have a close look of the computing time indicates on the number of passes made over the data. 19-06-2023 Data Structures
  • 238. • Not efficiently using the computers ability to carry I/O ,CPU operations in parallel and overlap some of the time. • Parallelism is an important consideration when sorting is done in a non multi programming environment (when I/O and CPU processing is going on parallel ,the CPU is idle during I/O) • Parallelism is not possible to achieve because of the structure of the OS 19-06-2023 Data Structures
  • 239. • K-way merging- To sort a set of sorted arrays of n values Heap sort is applied in k sorted arrays of n values The K-way Merge pattern looks like this; • We can push the smallest (first) element of each sorted array in a Min Heap to get the overall minimum. • After this step, we can take out the smallest (top) element from the heap and then add it to the merged list. • After removing the smallest element from the heap, insert the next element of the same list into the heap. • We can repeat steps 2 and 3 to populate the merged list in sorted order. • Time Complexity = O(N log K) where N is the total number of elements in all the K input arrays. • Space Complexity = O(K) 19-06-2023 Data Structures
  • 243. • Significant reduction in the number of comparisons needed to find the next smallest number by using the selection tree • A selection tree is a binary tree where each node represents the smaller of its 2 children • Thus the root node represents the smallest node in the tree 19-06-2023 Data Structures
  • 246. Sorting with tapes- • Sorting on tapes is carried out using the same steps as sorting on disks • Difference between sorting in tapes and disks lies in the manner in which runs are maintained on the external storage media. • Tapes are sequential access • Seek time and latency time are different for both tapes and disks • High seek time and latency time on tapes • The blocks on tape be read sequentially during k-way merge of runs 19-06-2023 Data Structures
  • 252. • Computing time analysis assumes that no operation are carried out in parallel 19-06-2023 Data Structures
  • 253. Symbol tables • A symbol table is a set of name-value pairs • Associated with each name in the table is an attribute , a collection of attributes ,or some directions about some processing • Symbol tables have fixed number of entries • Operations performed on symbol table are Ask if a particular name is already present Retrieve the attributes of that name Insert a new name and its value Delete a name and its value 19-06-2023 Data Structures
  • 255. • Different ways to implement symbol tables are • Static tree table • Dynamic tree table Static tree table – • When identifiers are known in advance • no insertion or deletions are allowed • Symbol tables with this property is called static • The names are sorted and stored them sequentially either using binary search tree or Fibonacci search method • Any names can be find out in o(log2n) operations 19-06-2023 Data Structures
  • 257. • While evaluating BST add a special “square” node at every place there is a null link 19-06-2023 Data Structures
  • 258. • Every binary tree with null links can be represented as two nodes • External nodes (or failure nodes)–they are not the part of the original tree • Internal nodes –remaining nodes are called as internal nodes • A binary search tree with the external nodes are called extended binary tree • Each time binary search tree is examined for an identifier • If it is not available in the tree then the search terminates with the unsuccessful searches 19-06-2023 Data Structures
  • 260. • Finding the length of the external path and internal path of a binary tree • External path length of a binary tree to be the sum over all external nodes of the lengths of the paths from the root to those nodes 19-06-2023 Data Structures
  • 262. Weighed external path length of such as binary tree is calculated by Where Ki is the distance from the root node to the external node with weight qi. Supose n=3,q1=15,q2=2,q3=4 and q5=5 19-06-2023 Data Structures
  • 264. • With over all binary tree with n internal nodes finding the minimum and maximum values for I • To obtain trees with minimal I ,there should be as many as internal nodes as close to the root node • One tree with minimal internal path length is the complete binary tree • Binary trees with minimal weighed external path length is used in many applications such as optimal set of codes for message M1,…Mn+1. • Each code in the binary string will be used for transmission of the corresponding message • At the receiving end it will be decoded using a decode tree • A decode tree is a binary tree in which external nodes represent messages • The binary bits in the code word for a message determine the branching needed at each level of the decode tree to reach the correct external node 19-06-2023 Data Structures
  • 265. 19-06-2023 Data Structures Huffman Codes- M1=000 M2=001 M3=01 M4=1 • The cost of decoding a code word is proportional to the number of bits in the code • This number is equal to the distance of the corresponding external node from the root node • The expected decode time is minimized by choosing code words resulting in a decode tree with minimal weighted external path length.
  • 266. Huffman Algorithm- • Huffman Coding is a technique of compressing data to reduce its size without losing any of the details. It was first developed by David Huffman in 1951. • It follows a Greedy approach, since it deals with generating minimum length prefix-free binary codes • Huffman Coding is generally useful to compress the data in which there are frequently occurring characters. • Each character occupies 8 bits. There are a total of 15 characters in the above string. Thus, a total of 8 * 15 = 120 bits are required to send this string. • Using the Huffman Coding technique, we can compress the string to a smaller size. • Huffman coding first creates a tree using the frequencies of the character and then generates code for each character. 19-06-2023 Data Structures
  • 267. Steps of Huffman encoding algorithm 1. Calculate the frequency of each character in the string. 2. Sort the characters in increasing order of the frequency. These are stored in a priority queue Q. 19-06-2023 Data Structures
  • 268. 3. Make each unique character as a leaf node. 4. Assign the minimum frequency as the left child and assign the second minimum frequency as the right child .Set the value as the sum of the above two minimum frequencies. 5. Repeat steps 3 & 4 for all the characters. 19-06-2023 Data Structures
  • 269. 6. For each non-leaf node, assign 0 to the left edge and 1 to the right edge. 19-06-2023 Data Structures
  • 270. 19-06-2023 Data Structures • Without encoding, the total size of the string was 120 bits. After encoding the size is reduced to 32 + 15 + 28 = 75. Decoding – • For decoding the code, we can take the code and traverse through the tree to find the character. • Let 101 is to be decoded, we can traverse from the root as in the figure below.
  • 272. • create a priority queue Q consisting of each unique character. • sort then in ascending order of their frequencies. • for all the unique characters: • create a newNode extract minimum value from Q and assign it to leftChild of newNode • extract minimum value from Q and assign it to rightChild of newNode • calculate the sum of these two minimum values and assign it to the value of newNode • insert this newNode into the tree return rootNode 19-06-2023 Data Structures
  • 273. Time Complexity – • The time complexity for encoding each unique character based on its frequency is O(nlog n). • Extracting minimum frequency from the priority queue takes place 2*(n-1) times and its complexity is O(log n). Thus the overall complexity is O(nlog n). Advantages of Huffman Encoding- • This encoding scheme results in saving lot of storage space, since the binary codes generated are variable in length • It generates shorter binary codes for encoding symbols/characters that appear more frequently in the input string • The binary codes generated are prefix-free 19-06-2023 Data Structures
  • 274. Disadvantages of Huffman Encoding- • Lossless data encoding schemes, like Huffman encoding, achieve a lower compression ratio compared to lossy encoding techniques. Thus, lossless techniques like Huffman encoding are suitable only for encoding text and program files and are unsuitable for encoding digital images. • Huffman encoding is a relatively slower process since it uses two passes- one for building the statistical model and another for encoding. Thus, the lossless techniques that use Huffman encoding are considerably slower than others. • Since length of all the binary codes is different, it becomes difficult for the decoding software to detect whether the encoded data is corrupt. This can result in an incorrect decoding and subsequently, a wrong output 19-06-2023 Data Structures
  • 275. Real-life applications of Huffman Encoding- •Huffman encoding is widely used in compression formats like GZIP, PKZIP (winzip) and BZIP2. •Multimedia codecs like JPEG, PNG and MP3 uses Huffman encoding (to be more precised the prefix codes) 19-06-2023 Data Structures
  • 276. Dynamic Tree tables- • Dynamic tables may also be maintained as BST • Insertion, deletion and searching of a node can be done • When insertions and deletions are done it is necessary to restructure the whole tree to accommodate the changes and at the same time it should be a complete binary tree • It gives the worst time complexity O(h) • To make less time time complexity the tree should be self balanced or height balanced using the balance factor • A method of growing self balanced /Height balanced tree is followed 19-06-2023 Data Structures
  • 277. • worst time complexity O(h) • Worst time complexity O(h) 19-06-2023 Data Structures O(h) h=log(n)
  • 278. AVL Tree- • Adelson –Velskii and Landis in 1962 introduced a balanced binary search tree with respect to the heights of the subtrees • Dynamic searching can be in the balanced BST can be performed in O(log n) time if the tree has n nodes on it • Insertion and deletion in the same tree can be done in O(log n) time • The resulting tree remains balanced 19-06-2023 Data Structures
  • 279. Balance factor=height of left tree-height of right subtree 19-06-2023 Data Structures The tree having the balance factor greater than 1 or less than -1 is not called balanced tree or AVL tree
  • 281. • If the tree is not an AVL tree then the tree can be converted to AVL tree by performing these operations • LL • RR • LR • RR 19-06-2023 Data Structures
  • 282. • Left rotation-If a tree becomes unbalanced, when a node is inserted into the right subtree of the right subtree, then we perform a single left rotation • Right rotation-AVL tree may become unbalanced, if a node is inserted in the left subtree of the left subtree. The tree then needs a right rotation 19-06-2023 Data Structures
  • 284. Right-Left Rotation • The second type of double rotation is Right-Left Rotation. It is a combination of right rotation followed by left rotation 19-06-2023 Data Structures
  • 285. Hashing- • Hashing is an important data structure designed to solve the problem of efficiently finding and storing data in an array. • Hashing is a method for storing and retrieving records from a database. • insert, delete, and search for records based on a search key value in a constant time • A hash system stores records in an array called a hash table (HT) • Every hash table contains values or records stored sequentially . • Hashing works by performing a computation on a search key K in a way that is intended to identify the position in HT that contains the record with key K. • Hash table is partitioned into b buckets HT(0)….HT(b-1) • Each bucket is capable of holding s records in s slots each slot being large enough to hold 1 record • Each bucket can hold exactly 1 record in each slot 19-06-2023 Data Structures
  • 287. • Hash tables use a technique to generate these unique index numbers for each value stored in an array format. This technique is called the hash technique or hashing • Hashing searches an identifier or record by the address or location of the record. 19-06-2023 Data Structures
  • 288. • It returns the following values: a small integer value (also known as hash value), hash codes, and hash sums. The hashing techniques in the data structure are very interesting, such as: • hash = hashfunc(key) • index = hash % array_size • Types of hashing in data structure is a two-step process. The hash function converts the item into a small integer or hash value. This integer is used as an index to store the original data. It stores the data in a hash table. a hash key can be used to to locate data quickly. 19-06-2023 Data Structures
  • 289. • Overflow occurs when a new identifier is mapped or hashed into a full bucket • Collison occurs when two non identical identifiers are hashed into the same bucket /Collision in hashing is when two or more elements are fighting for the same slot in the hash table/If the hash function returns the same index for more than one element then the collision will occur. • When bucket size is 1 (s=1) collision and overflows simultaneously occurs • Hashing functions/Methods to handle overflows and collisions are Mid square Division Folding Digit analysis 19-06-2023 Data Structures
  • 290. Mid-square(middle of square) : • Mid-Square(fm) hashing is a hashing technique in which unique keys are generated. • a seed value is taken and it is squared. • Then, some digits from the middle are extracted. These extracted digits form a number which is taken as the new seed. • This technique can generate keys with high randomness if a big enough seed value is taken. • This process is repeated as many times as a key is required. 19-06-2023 Data Structures
  • 291. Example- Suppose a 4-digit seed is taken. seed = 4765 Hence, square of seed is = 4765 * 4765 = 22705225 Now, from this 8-digit number, any four digits are extracted (Say, the middle four). So, the new seed value becomes seed = 7052 Now, square of this new seed is = 7052 * 7052 = 49730704 Again, the same set of 4-digits is extracted. So, the new seed value becomes seed = 7307 . . 19-06-2023 Data Structures
  • 292. Division- • Hash function obtained by using the modulo(mod) operator • The value is divided by some number M(size of the hash table) and the remainder is used as the hash address for X • Example Size of Hash Table (m) = 1000 (0 - 999) Suppose we want to calculate the index of element x, where x = 123789456 index =123789456 mod 1000 = 456 The element x is stored at position 456 in the hash table. 19-06-2023 Data Structures
  • 293. Folding – • The key k is partitioned into a number of parts k1, k2.... kn where each part except possibly the last, has the same number of digits as the required address. • Then the parts are added together, ignoring the last carry. • There are two type of folding: Shift –all are added except least bit Boundary-Alternate pieces are flipped on the boundary. Boundary folding is indicated by 𝑝𝑖 𝑟 19-06-2023 Data Structures
  • 294. Digit analysis- • Digit analysis, is used with static files. • A static file is one in which all the identifiers are known in advance. Using this method, we first transform the identifiers into numbers using some radix, r. • Then examine the digits of each identifier, deleting those digits that have the most skewed distributions. Continue deleting digits until the number of remaining digits is small enough to give an address in the range of the hash table. • The digits used to calculate the hash address must be the same for all identifiers and must not have abnormally high peaks or valleys (the standard deviation must be small). 19-06-2023 Data Structures
  • 295. Overflow handling – • To detect/handle overflow and collisions/open addressing • Different ways are Linear probing Quadratic probing Double hashing Linear probing – In linear probing, the hash table is searched sequentially that starts from the original location of the hash. If in case the location is already occupied, then check for the next location. It is also called as rehashing 19-06-2023 Data Structures
  • 296. For example Let us consider a simple hash function as “key mod 7” and a sequence of keys as 50, 700, 76, 85, 92, 73, 101. Let us consider a simple hash function as “key mod 5” and a sequence of keys that are to be inserted are 50, 70, 76, 93. 19-06-2023 Data Structures Let hash(x) be the slot index computed using a hash function and S be the table size If slot hash(x) % S is full, then we try (hash(x) + 1) % S If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S 50%7=1 700%7=0 76%7=6 85%7=1 92%7=1 73%7=3 101%7=3
  • 297. • 50, 70, 76, 93 50%5=0 70%5=0 76%5=1 • 93%5=3 19-06-2023 Data Structures
  • 298. Quadratic probing- • In this method, we look for the i2‘th slot in the ith iteration. • Always start from the original hash location. If only the location is occupied then check the other slots. let hash(x) be the slot index computed using hash function. If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S 19-06-2023 Data Structures
  • 299. Let us consider table Size = 7, hash function as Hash(x) = x % 7 Insert = 22, 30, 50. 19-06-2023 Data Structures
  • 300. • Insert 22 and 30Hash(22) = 22 % 7 = 1, Since the cell at index 1 is empty, we can easily insert 22 at slot 1. • Hash(30) = 30 % 7 = 2, Since the cell at index 2 is empty, we can easily insert 30 at slot 2 19-06-2023 Data Structures
  • 301. • Inserting 50Hash(50) = 50 % 7 = 1 • In our hash table slot 1 is already occupied. So, we will search for slot 1+12, i.e. 1+1 = 2, • Again slot 2 is found occupied, so we will search for cell 1+22, i.e.1+4 = 5, • Now, cell 5 is not occupied so we will place 50 in slot 5. 19-06-2023 Data Structures
  • 302. Double hashing- • In this technique, the increments for the probing sequence are computed by using another hash function. • use another hash function hash2(x) and look for the i*hash2(x) slot in the ith rotation. let hash(x) be the slot index computed using hash function. If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S If (hash(x) + 1*hash2(x)) % S is also full, then we try (hash(x) + 2*hash2(x)) % S If (hash(x) + 2*hash2(x)) % S is also full, then we try (hash(x) + 3*hash2(x)) % S 19-06-2023 Data Structures
  • 303. • Insert the keys 27, 43, 92, 72 into the Hash Table of size 7. where first hash- function is h1​(k) = k mod 7 and second hash-function is h2(k) = 1 + (k mod 5) • Insert 27 27 % 7 = 6, location 6 is empty so insert 27 into 6 slot. 19-06-2023 Data Structures
  • 304. • Insert 43 43 % 7 = 1, location 1 is empty so insert 43 into 1 slot. 19-06-2023 Data Structures
  • 305. • Insert 92 • 92 % 7 = 6, but location 6 is already being occupied and this is a collision • So need to resolve this collision using double hashing. • h1​(k) = k mod 7 • h2(k) = 1 + (k mod 5) 19-06-2023 Data Structures hnew = [h1(92) + i * (h2(92)] % 7 = [6 + 1 * (1 + 92 % 5)] % 7 = 9 % 7 = 2 Now, as 2 is an empty slot, so we can insert 92 into 2nd slot.
  • 306. •Insert 72 •72 % 7 = 2, but location 2 is already being occupied and this is a collision. •So we need to resolve this collision using double hashing. hnew = [h1(72) + i * (h2(72)] % 7 = [2 + 1 * (1 + 72 % 5)] % 7 = 5 % 7 = 5, Now, as 5 is an empty slot, so we can insert 72 into 5th slot. 19-06-2023 Data Structures
  • 307. Unit-5 Internal Sorting • Sorting is categorized into • Internal sorting • External sorting • Internal sorting methods are • Insertion sort • Quick sort • 2-way Merge sort • Heap sort • Shell sort 19-06-2023 Data Structures
  • 308. Insertion sort- • The basic step is to insert a record r into a sequence of ordered records. • It is carried out in the beginning with the ordered sequence and then successively inserting the records into the Sequence 19-06-2023 Data Structures
  • 309. • This algorithm is not suitable for large data sets as its average and worst case complexity are of Ο(n2), where n is the number of items. Quick sort- • It is developed by C.A.R hoare • Sorting with a good average behaviour • Quick sort is a highly efficient sorting algorithm and is based on partitioning of array of data into smaller arrays • A large array is partitioned into two arrays one of which holds values smaller than the specified value, say pivot, based on which the partition is made and another array holds values greater than the pivot value. • Quicksort partitions an array and then calls itself recursively twice to sort the two resulting subarrays. This algorithm is quite efficient for large-sized data sets as its average and worst-case complexity are O(n2), respectively. 19-06-2023 Data Structures
  • 310. • This algorithm follows the divide and conquer approach. • Divide and conquer is a technique of breaking down the algorithms into subproblems, then solving the subproblems, and combining the results back together to solve the original problem. 19-06-2023 Data Structures
  • 314. 2-way merge sort- 19-06-2023 Data Structures
  • 316. Heap Sort- • Heap is a tree-based data structure in which all the tree nodes are in a particular order, such that the tree satisfies the heap properties • Heap sort may be regarded as two stage method It is converted to heap with the property that the value of each node is at least as large as the value of its children nodes .root is the largest key in the tree The output sequence is generated in decreasing order by successively outputting the root and restructuring the remaining tree into a heap • Follow the given steps to solve the problem: Build a max heap from the input data. At this point, the maximum element is stored at the root of the heap. Replace it with the last item of the heap followed by reducing the size of the heap by 1. Finally, heapify the root of the tree. Repeat step 2 while the size of the heap is greater than 1. 19-06-2023 Data Structures
  • 319. Shell sort- • Shell sort is the generalization of insertion sort, which overcomes the drawbacks of insertion sort by comparing elements separated by a gap of several positions. • it is an extended version of insertion sort. Shell sort has improved the average time complexity of insertion sort. As similar to insertion sort, it is a comparison-based and in-place sorting algorithm. • Shell sort is efficient for medium-sized data sets. • In insertion sort, at a time, elements can be moved ahead by one position only. To move an element to a far-away position, many movements are required that increase the algorithm's execution time. But shell sort overcomes this drawback of insertion sort. It allows the movement and swapping of far-away elements as well. • This algorithm first sorts the elements that are far away from each other, then it subsequently reduces the gap between them. This gap is called as interval. This interval can be calculated by using the Knuth's formula given below – 19-06-2023 Data Structures •h= h * 3 + 1 •where, 'h' is the interval having initial value 1.
  • 320. 19-06-2023 Data Structures in the first loop, the element at the 0th position will be compared with the element at 4th position. If the 0th element is greater, it will be swapped with the element at 4th position. Otherwise, it remains the same. This process will continue for the remaining elements.
  • 321. 19-06-2023 Data Structures In the second loop, elements are lying at the interval of 2 (n/4 = 2), where n = 8. Now, we are taking the interval of 2 to sort the rest of the array. With an interval of 2, two sublists will be generated - {12, 25, 33, 40}, and {17, 8, 31, 42}.
  • 322. Files ,Queries and Sequential organizations Files- • A file is a collection of records where each record consists of one or more fields. • Primary objective of file organization is to provide means for record retrieval and update • Update includes deletion, changes in fields or insertion of entirely new record 19-06-2023 Data Structures
  • 323. • Certain fields in the record are designated as key fields • Records may be retrieved by specifying values for some or all of these keys. • Combination of key values specified for retrieval is called query • Invalid query to the file would be location=Los angeles 19-06-2023 Data Structures
  • 324. • Obtaining data representations of files on external storage devices for efficient use should have some factors Kind of external storage device available Type of queries allowed Number of keys Mode of retrieval/update Storage device types • Concerned abut files stored on disks/tapes Query types 19-06-2023 Data Structures
  • 325. Number of keys – • Distinction between files having only one key or files with more than one key Mode of retrieval- • May be either real time or batched • In real time the response time for any query should be minimal • In the batched mode the response time is not significant .Request for retrieval are batched together on a transaction file until either enough requests have been received or suitable amount of time has passed.then all transaction files are processed Mode of update • Either be real or batched • Real time update is needed for eg reservation of flight file must be changed to show the new status 19-06-2023 Data Structures
  • 326. • Batched update would be suitable in bank account system .for eg all withdrawals and deposits made on particular day collected on a transaction file and updates are made at the end of the day • Batched update consists of two files :master file and transaction file • Master file-represents the file status after the previous update • Transaction file-holds all the update requests that have not yet been reflected in the master file so master file is always “out of date” • Records are placed sequentially onto the storage media (adjacent to each other) • The physical sequence of records is ordered on some key called primary key • For batched retrieval and update ordered sequential files are preferred over unordered sequential files since they are easier to process 19-06-2023 Data Structures
  • 327. • File organization breaks down into two or more aspects The directory The physical organization of the records (sequential) • Processing a query /update request would proceed in 2 steps Indexes used to determine the parts of the physical file that are to be searched These parts of the file will be searched and accessing the records satisfying the query 19-06-2023 Data Structures
  • 328. File Organizations- • Sequential organization • Random Organization • Linked organization • Inverted files • Cellular partitions 19-06-2023 Data Structures
  • 329. Sequential Organization- • Cylinder –surface index is maintained for the primary key • In order to retrieve records efficiently indexes can be used • Structure of the indexes is based on the index techniques Random organization- • Records are stored at random locations on the disk • Several techniques are used for randomization .they are Direct addressing Directory lookup hashing 19-06-2023 Data Structures
  • 330. Direct addressing- • Available disk space is divided in to nodes large enough to store records of equal size • The numeric value of the primary key is used to determine the node into which a particular record is to be stored • Searching and deleting a record by primary key value requires one disk access • Updating a record requires 2 (1 to read and 1 to write back to the modified record) • Variable size records are being used an index can be set up with pointers to actual records on the disk 19-06-2023 Data Structures
  • 332. Directory lookup- • Retrieving a record involves searching the index for the record address and then accessing the record itself • The records an be of fixed or variable size • Searching a record by index requires more than 1 access • Every record has a unique primary key • 2 or more records with the same primary key can cause collisions Hashing- • The available space is divided into buckets and slots • Every record have hashed index • Some space is set aside to handle overflow 19-06-2023 Data Structures

Editor's Notes

  1. https://www.youtube.com/watch?v=bbkdiUbou74