Bit 4107 advanced business data structures and computer algorithms

1 | P a g e
P.O. Box 342-01000 Thika
Email: info@mku.ac.ke
Web: www.mku.ac.ke
COURSE CODE: BIT 4107
COURSE TITLE: ADVANCED BUSINESS DATA
STRUCTURES AND COMPUTER ALGORITHMS
Instructional Manual for BBIT – Distance Learning
Prepared by Paul M Kathale, mutindaz@yahoo.com

2 | P a g e
TABLE OF CONTENTS
FOUNDATIONS TO DATA STRUCTURES .............................................................................................6
Basic Definitions.......................................................................................................................................6
Structural and Behavioral Definitions.......................................................................................................7
Abstract Data Types (ADT)......................................................................................................................8
Categories of data types............................................................................................................................9
Structural Relationships............................................................................................................................9
Why study Data structures......................................................................................................................12
INTRODUCTION TO DESIGN AND ALGORITHM ANALYSIS .........................................................13
The Classic Multiplication Algorithm ....................................................................................................13
Algorithm's Performance ........................................................................................................................14
Θ-Notation (Same order) ........................................................................................................................15
Ο-Notation (Upper Bound).....................................................................................................................16
Ω-Notation (Lower Bound).....................................................................................................................17
Algorithm Analysis.................................................................................................................................17
Optimality ...............................................................................................................................................18
Reduction................................................................................................................................................18
MATHEMATICS FOR ALGORITHMIC..................................................................................................19
Sets..........................................................................................................................................................19
Union of Sets.......................................................................................................................................20
Symmetric difference..........................................................................................................................22
Sequences............................................................................................................................................22
Linear Inequalities and Linear Equations ...............................................................................................24
Inequalities..........................................................................................................................................24
Fundamental Properties of Inequalities...............................................................................................24
Solution of Inequality .........................................................................................................................24
Geometric Interpretation of Inequalities.............................................................................................25
One Unknown .....................................................................................................................................26
Two Unknowns...................................................................................................................................26
n Equations in n Unknowns ................................................................................................................28
Solution of a Triangular System .........................................................................................................30
Back Substitution Method...................................................................................................................30
Gaussian Elimination..........................................................................................................................31

3 | P a g e
Second Part .........................................................................................................................................32
Determinants and systems of linear equations....................................................................................33
ALGORITHMS DESIGN & ANALYSIS TECHNIQUES : Greedy algorithm........................................34
Greedy Approach................................................................................................................................34
Characteristics and Features of Problems solved by Greedy Algorithms...........................................35
Definitions of feasibility .....................................................................................................................36
1. An Activity Selection Problem ...................................................................................................36
An activity-selection is the problem of scheduling a resource among several competing
activity. Problem Statement...............................................................................................................36
Greedy Algorithm for Selection Problem ...........................................................................................37
2. Minimum Spanning Tree ............................................................................................................40
3. Kruskal's Algorithm....................................................................................................................43
4. Prim's Algorithm.........................................................................................................................49
5. Dijkstra's Algorithm....................................................................................................................50
Analysis ..............................................................................................................................................50
Example: Step by Step operation of Dijkstra algorithm. ....................................................................50
ALGORITHMS DESIGN & ANALYSIS TECHNIQUES: Divide & Conquer Algorithm.......................54
Binary Search (simplest application of divide-and-conquer)..............................................................54
Sequential Search................................................................................................................................54
Analysis ..............................................................................................................................................55
Binary Search......................................................................................................................................55
Analysis ..............................................................................................................................................55
Iterative Version of Binary Search......................................................................................................55
Analysis ..............................................................................................................................................56
ALGORITHMS DESIGN & ANALYSIS TECHNIQUES: Dynamic Programming Algorithm...............57
The Principle of Optimality ................................................................................................................59
1. Matrix-chain Multiplication Problem ........................................................................................62
2. 0-1 Knapsack Problem................................................................................................................73
3. Knapsack Problem ......................................................................................................................74
ALGORITHMS DESIGN & ANALYSIS TECHNIQUES: Amortized Analysis......................................77
1. Aggregate Method ..............................................................................................................................77
Aggregate Method Characteristics......................................................................................................77
2. Accounting Method ............................................................................................................................79

4 | P a g e
3. Potential Method.................................................................................................................................83
GRAPH ALGORITHMS............................................................................................................................97
Graph Theory is an area of mathematics that deals with following types of problems ......................97
Introduction to Graphs ............................................................................................................................97
Definitions...............................................................................................................................................97
Graphs, vertices and edges..................................................................................................................97
Undirected and directed graphs...........................................................................................................97
Neighbours and adjacency ..................................................................................................................97
An example .........................................................................................................................................97
Mathematical definition......................................................................................................................98
Digraph ...................................................................................................................................................98
1. Transpose..........................................................................................................................................100
2. Square ...............................................................................................................................................101
3. Incidence Matrix ...............................................................................................................................102
Types of Graph Algorithms ..................................................................................................................103
1. Breadth-First Search Traversal Algorithm................................................................................103
2. Depth-First Search ....................................................................................................................110
3. Strongly Connected Components..............................................................................................118
4. Euler Tour .................................................................................................................................124
Running Time of Euler Tour.............................................................................................................126
AUTOMATA THEORY ..........................................................................................................................127
What is Automata Theory? ...................................................................................................................127
The Central Concepts of Automata Theory ..........................................................................................128
Languages .............................................................................................................................................129
Structural expressions ...........................................................................................................................130
Proofs....................................................................................................................................................131
Terminology......................................................................................................................................131
Hints for Finding Proofs ...................................................................................................................131
Proving techniques................................................................................................................................133
By contradiction................................................................................................................................133
By induction......................................................................................................................................134
Proof by Induction: Example ............................................................................................................135
Proof by Construction.......................................................................................................................135

5 | P a g e
“If-and-Only-If” statements..................................................................................................................136
REFERENCES .........................................................................................................................................137

6 | P a g e
FOUNDATIONS TO DATA STRUCTURES
Basic Definitions
Data structures
This is the study of methods of representing objects, the design of algorithms to manipulate the
representations, the proper encapsulation of objects in a reusable form, and the evaluation of the
cost of the implementation, including the measurement of the complexity of the time and space
requirements.
Algorithms
 A finite step-by-step procedure to solve a given problem.
 A sequence of computational steps that transform input into output.
Abstraction
This is the separation between what a data structure represents and what an algorithm
accomplishes, from the implementation details of how things are actually carried out. I.e, hiding
the unnecessary details
Data Abstraction
Hiding of the representational details
Data Types
A data type consists of: a domain (= a set of values) and a set of operations; the kind of data
variables may “hold”.
Example 1:
Boolean or logical data type provided by most programming languages.
 Two values: true, false.
 Many operations, including AND, OR, NOT, etc.

7 | P a g e
Example 2:
The data type fraction. How can we specify the domain and operations that define fractions? It
seems straightforward to name the operations; fractions are numbers so all the normal arithmetic
operations apply, such as addition, multiplication, and comparison. In addition there might be
some fraction-specific operations such as normalizing a fraction by removing common terms
from its numerator and denominator - for example, if we normalized 6/9 we'd get 2/3.
But how do we specify the domain for fractions, i.e. the set of possible values for a fraction?
Structural and Behavioral Definitions
There are two different approaches to specifying a domain: we can give a structural definition,
or we can give a behavioral definition. Let us see what these two are like.
Structural Definition of the domain for `Fraction'
The value of a fraction is made of three parts (or components):
 A sign, which is either + or -
 A numerator, which may be any non-negative integer
 A denominator, which may be any positive integer (not zero, not negative).
This is called a structural definition because it defines the values of type `fraction' by imposing
an internal structure on them (they have 3 parts...). The parts themselves have specific types, and
there may be further constraints. For example, we could have insisted that a fraction's numerator
and denominator have no common divisor (in that case we wouldn't need the normalize
operation - 6/9 would not be a fraction by this definition).
Behavioral Definition of the domain for `Fraction'
The alternative approach to defining the set of values for fractions does not impose any internal
structure on them. Instead it just adds an operation that creates fractions out of other things, such
as
CREATE_FRACTION(N,D)
Where N is any integer, D is any non-zero integer.

8 | P a g e
The values of type fraction are defined to be the values that are produced by this function for any
valid combination of inputs.
The parameter names were chosen to suggest its intended behavior:
CREATE_FRACTION(N,D) should return a value representing the fraction N/D (N for
numerator, D for denominator).
How do we guarantee that CREATE_FRACTION(N,D) actually returns the fraction N/D?
The answer is that we have to constrain the behavior of this function, by relating it to the other
operations on fractions. For example, one of the key properties of multiplication is that:
NORMALIZE ((N/D) * (D/N)) = 1/1
This turns into a constraint on CREATE_FRACTION:
NORMALIZE (CREATE_FRACTION(N,D) * CREATE_FRACTION(D,N)) =
CREATE_FRACTION(1,1)
So you see CREATE_FRACTION cannot be any old function, its behavior is highly constrained,
because we can write down lots and lots of constraints like this.
And that's the reason we call this sort of definition behavioral, because the definition is strictly in
terms of a set of operations and constraints or axioms relating the behavior of the operations to
one another.
Abstract Data Types (ADT)
 An Abstract Data Type (ADT) defines data together with the operations.
 ADT is specified independently of any particular implementation. ADT depicts the basic
nature or concept of the data structure rather than the implementation details of the data.
 Examples of ADTs- stack,queue, list, graphs, trees.
 ADTs are implemented using : using an array or using a linked list.

9 | P a g e
Categories of data types
 Atomic/Basic data types
 Structured data types
 Abstract Data types
Atomic/Simple Data Types
 These are data types that are defined without imposing any structure on their values.
 Example
o Boolean
o Integer
o Character
o Double
 They are used to implement structured datypes.
Structured Data Types
 The opposite of atomic is structured. A structured data type has a definition that imposes
structure upon its values. As we saw above, fractions normally are a structured data type.
 In structured data types, there is an internal structural relationship, or organization,
that holds between the components.
Example,
Think of an array as a structured type, with each position in the array being a component, then
there is a structural relationship of `followed by': we say that component N is followed by
component N+1.
Structural Relationships
 Many structured data types do have an internal structural relationship, and these can be
classified according to the properties of this relationship.
N N+1 N+2 N+3 N+i

10 | P a g e
Linear Structure:
The most common organization for components is a linear structure. A structure is linear if it has
these 2 properties:
 Property P1: Each element is `followed by' at most one other element.
 Property P2: No two elements are `followed by' the same element.
An array is an example of a linearly structured data type. We generally write a linearly structured
data type like this: A->B->C->D (this is one value with 4 parts).
 Counter example 1 (violates P1): A points to B and C B<-A->C
 Counter example 2 (violates P2): A and B both point to C A->C<-B
Tree Structure
In tree structure, an element can point to more than one other element, but no two element can
point to one element. i.e
Dropping Constraint P1: If we drop the first constraint and keep the second we get a tree
structure or hierarchy: no two elements are followed by the same element. This is a very
common structure too, and extremely useful.
Counter example 1 is a tree, but counter example 2 is not.

11 | P a g e
A is followed by B C D, B by E F, C by G. We are not allowed to add any more arcs that point to
any of these nodes (except possibly A - see cyclic structures below).
Graph Structure
Graph - Is a non linier structure in which component may have more than one predecessor and
more than one successor.
Dropping both P1 and P2:
If we drop both constraints, we get a graph. In a graph, there are no constraints on the relations
we can define.
Graph - Is a non linier structure in which component may have more than one predecessor and
more than one successor.
Cyclic Structures:
All the examples we've seen are acyclic. This means that there is no sequence of arrows that
leads back to where it started. Linear structures are usually acyclic, but cyclic ones are not
uncommon.

12 | P a g e
Example of cyclic linear structure: A B C D A
Trees are virtually always acyclic.
Graphs are often cyclic, although the special properties of acyclic graphs make them an
important topic of study.
Example: Add an edge from G to D, and from E to A.
Why study Data structures
 Helps to understand how data is organized and stored. This is essential for creating
efficient algorithms.
 Gives designers a clear notion of relative advantages and disadvantages of each type of
data structure.
 Gives ability to make correct decisions regarding which data structure to use based on the
following issues:
o Run time – Number of operations to perform a given task
o Memory & secondary storage utilization
o Developmental cost of the program - total person-hour invested
i.e helps to make trade off between the three issues. This because no absolute data structure
which is best.
 Study of data structures expose designers/students to vast collection of tried and proven
methods used for designing efficient programs.

13 | P a g e
INTRODUCTION TO DESIGN AND ALGORITHM ANALYSIS
An algorithm, named after the ninth century scholar Abu Jafar Muhammad Ibn Musu Al-
Khowarizmi, is defined as follows:
 An algorithm is a set of rules for carrying out calculation either by hand or on a machine.
 An algorithm is a finite step-by-step procedure to achieve a required result.
 An algorithm is a sequence of computational steps that transform the input into the
output.
 An algorithm is a sequence of operations performed on data that have to be organized in
data structures.
 An algorithm is an abstraction of a program to be executed on a physical machine (model
of Computation).
The most famous algorithm in history dates well before the time of the ancient Greeks: this is the
Euclid's algorithm for calculating the greatest common divisor of two integers. This theorem
appeared as the solution to the Proposition II in the Book VII of Euclid's "Elements." Euclid's
"Elements" consists of thirteen books, which contain a total number of 465 propositions.
The Classic Multiplication Algorithm
1. Multiplication, the American way:
Multiply the multiplicand one after another by each digit of the multiplier taken from right to
left.
2. Multiplication, the English way:
Multiply the multiplicand one after another by each digit of the multiplier taken from left to
right.

14 | P a g e
Algorithmic is a branch of computer science that consists of designing and analyzing computer
algorithms
1. The “design” pertain to:
i. The description of algorithm at an abstract level by means of a pseudo language, and
ii. Proof of correctness that is, the algorithm solves the given problem in all cases.
2. The “analysis” deals with performance evaluation (complexity analysis).
We start with defining the model of computation, which is usually the Random Access Machine
(RAM) model, but other models of computations can be use such as PRAM. Once the model of
computation has been defined, an algorithm can be describe using a simple language (or pseudo
language) whose syntax is close to programming language such as C or java.
Algorithm's Performance
Two important ways to characterize the effectiveness of an algorithm are its space complexity
and time complexity. Time complexity of an algorithm concerns determining an expression of
the number of steps needed as a function of the problem size. Since the step count measure is
somewhat coarse, one does not aim at obtaining an exact step count. Instead, one attempts only
to get asymptotic bounds on the step count. Asymptotic analysis makes use of the O (Big Oh)
notation. Two other notational constructs used by computer scientists in the analysis of
algorithms are Θ (Big Theta) notation and Ω (Big Omega) notation.
The performance evaluation of an algorithm is obtained by totaling the number of occurrences of
each operation when running the algorithm. The performance of an algorithm is evaluated as a
function of the input size n and is to be considered modulo a multiplicative constant.

15 | P a g e
The following notations are commonly use notations in performance analysis and used to
characterize the complexity of an algorithm.
Θ-Notation (Same order)
This notation bounds a function to within constant factors. We say f(n) = Θ(g(n)) if there exist
positive constants n0, c1 and c2 such that to the right of n0 the value of f(n) always lies between c1
g(n) and c2 g(n) inclusive.
In the set notation, we write as follows:
Θ(g(n)) = {f(n) : there exist positive constants c1, c1, and n0 such that 0 ≤ c1 g(n) ≤ f(n) ≤ c2 g(n)
for all n ≥ n0}
We say that is g(n) an asymptotically tight bound for f(n).
Graphically, for all values of n to the right of n0, the value of f(n) lies at or above c1 g(n) and at or
below c2 g(n). In other words, for all n ≥ n0, the function f(n) is equal to g(n) to within a constant
factor. We say that g(n) is an asymptotically tight bound for f(n).
In the set terminology, f(n) is said to be a member of the set Θ(g(n)) of functions. In other words,
because O(g(n)) is a set, we could write
f(n) ∈ Θ(g(n))
to indicate that f(n) is a member of Θ(g(n)). Instead, we write
f(n) = Θ(g(n))
to express the same notation.
Historically, this notation is "f(n) = Θ(g(n))" although the idea that f(n) is equal to something
called Θ(g(n)) is misleading.

16 | P a g e
Example: n2
/2 − 2n = (n2
), with c1 = 1/4, c2 = 1/2, and n0 = 8.
Ο-Notation (Upper Bound)
This notation gives an upper bound for a function to within a constant factor. We write f(n) =
O(g(n)) if there are positive constants n0 and c such that to the right of n0, the value of f(n)
always lies on or below c g(n).
In the set notation, we write as follows: For a given function g(n), the set of functions
Ο(g(n)) = {f(n): there exist positive constants c and n0 such that 0 ≤ f(n) ≤ c g(n) for all n ≥ n0}
We say that the function g(n) is an asymptotic upper bound for the function f(n). We use Ο-
notation to give an upper bound on a function, to within a constant factor.
Graphically, for all values of n to the right of n0, the value of the function f(n) is on or below
g(n). We write f(n) = O(g(n)) to indicate that a function f(n) is a member of the set Ο(g(n)) i.e.
f(n) ∈ Ο(g(n))
Note that f(n) = Θ(g(n)) implies f(n) = Ο(g(n)), since Θ-notation is a stronger notation than Ο-
notation.
Example: 2n2
= Ο(n3
), with c = 1 and n0 = 2.
Equivalently, we may also define f is of order g as follows:
If f(n) and g(n) are functions defined on the positive integers, then f(n) is Ο(g(n)) if and only if
there is a c > 0 and an n0 > 0 such that
| f(n) | ≤ | g(n) | for all n ≥ n0

17 | P a g e
Historical Note: The notation was introduced in 1892 by the German mathematician Paul
Bachman.
Ω-Notation (Lower Bound)
This notation gives a lower bound for a function to within a constant factor. We write f(n) =
Ω(g(n)) if there are positive constants n0 and c such that to the right of n0, the value of f(n)
always lies on or above c g(n).
In the set notation, we write as follows: For a given function g(n), the set of functions
Ω(g(n)) = {f(n) : there exist positive constants c and n0 such that 0 ≤ c g(n) ≤ f(n) for all n ≥ n0}
We say that the function g(n) is an asymptotic lower bound for the function f(n).
The intuition behind Ω-notation is shown above.
Example: √n = (lg n), with c = 1 and n0 = 16.
Algorithm Analysis
The complexity of an algorithm is a function g(n) that gives the upper bound of the number of
operation (or running time) performed by an algorithm when the input size is n.
There are two interpretations of upper bound.
Worst-case Complexity
The running time for any given size input will be lower than the upper bound except
possibly for some values of the input where the maximum is reached.
Average-case Complexity
The running time for any given size input will be the average number of operations over
all problem instances for a given size.

18 | P a g e
Because, it is quite difficult to estimate the statistical behavior of the input, most of the time we
content ourselves to a worst case behavior. Most of the time, the complexity of g(n) is
approximated by its family o(f(n)) where f(n) is one of the following functions. n (linear
complexity), log n (logarithmic complexity), na
where a ≥ 2 (polynomial complexity), an
(exponential complexity).
Optimality
Once the complexity of an algorithm has been estimated, the question arises whether this
algorithm is optimal. An algorithm for a given problem is optimal if its complexity reaches the
lower bound over all the algorithms solving this problem. For example, any algorithm solving
“the intersection of n segments” problem will execute at least n2
operations in the worst case
even if it does nothing but print the output. This is abbreviated by saying that the problem has
Ω(n2
) complexity. If one finds an O(n2
) algorithm that solve this problem, it will be optimal and
of complexity Θ(n2
).
Reduction
Another technique for estimating the complexity of a problem is the transformation of problems,
also called problem reduction. As an example, suppose we know a lower bound for a problem A,
and that we would like to estimate a lower bound for a problem B. If we can transform A into B
by a transformation step whose cost is less than that for solving A, then B has the same bound as
A.
The Convex hull problem nicely illustrates "reduction" technique. A lower bound of Convex-hull
problem established by reducing the sorting problem (complexity: Θ(n log n)) to the Convex hull
problem.

19 | P a g e
MATHEMATICS FOR ALGORITHMIC
Sets
A set is a collection of different things (distinguishable objects or distinct objects) represented as
a unit. The objects in a set are called its elements or members. If an object x is a member of a set
S, we write x S. On the the hand, if x is not a member of S, we write z S. A set cannot
contain the same object more than once, and its elements are not ordered.
For example, consider the set S= {7, 21, 57}. Then 7 {7, 21, 57} and 8 {7, 21, 57} or
equivalently, 7 S and 8 S.
We can also describe a set containing elements according to some rule. We write
{n : rule about n}
Thus, {n : n = m2
for some m N } means that a set of perfect squares.
Set Cardinality
The number of elements in a set is called cardinality or size of the set, denoted |S| or sometimes
n(S). The two sets have same cardinality if their elements can be put into a one-to-one
correspondence. It is easy to see that the cardinality of an empty set is zero i.e., | | .
Mustiest
If we do want to take the number of occurrences of members into account, we call the group a
multiset.
For example, {7} and {7, 7} are identical as set but {7} and {7, 7} are different as multiset.
Infinite Set
A set contains infinite elements. For example, set of negative integers, set of integers, etc.
Empty Set
Set contain no member, denoted as or {}.

20 | P a g e
Subset
For two sets A and B, we say that A is a subset of B, written A B, if every member of A also is
a member of B.
Formally, A B if
x A implies x B
written
x A => x B.
Proper Subset
Set A is a proper subset of B, written A B, if A is a subset of B and not equal to B. That is, A
set A is proper subset of B if A B but A B.
Equal Sets
The sets A and B are equal, written A = B, if each is a subset of the other. Rephrased definition,
let A and B be sets. A = B if A B and B A.
Power Set
Let A be the set. The power of A, written P(A) or 2A
, is the set of all subsets of A. That is, P(A)
= {B : B A}.
For example, consider A={0, 1}. The power set of A is {{}, {0}, {1}, {0, 1}}. And the power set
of A is the set of all pairs (2-tuples) whose elements are 0 and 1 is {(0, 0), (0, 1), (1, 0), (1, 1)}.
Disjoint Sets
Let A and B be sets. A and B are disjoint if A B = .
Union of Sets
The union of A and B, written A B, is the set we get by combining all elements in A and B into
a single set. That is,
A B = { x : x A or x B}.
For two finite sets A and B, we have identity

21 | P a g e
|A B| = |A| + |B| - |A B|
We can conclude
|A B| |A| + |B|
That is,
if |A B| = 0 then |A B| = |A| + |B| and if A B then |A| |B|
Intersection Sets
The intersection of set set A and B, written A B, is the set of elements that are both in A and
in B. That is,
A B = { x : x A and x B}.
Partition of Set
A collection of S = {Si} of nonempty sets form a partition of a set if
i. The set are pair-wise disjoint, that is, Si, Sj and i j imply Si Sj = .
ii. Their union is S, that is, S = Si
In other words, S form a partition of S if each element of S appears in exactly on Si.
Difference of Sets
Let A and B be sets. The difference of A and B is
A - B = {x : x A and x B}.
For example, let A = {1, 2, 3} and B = {2, 4, 6, 8}. The set difference A - B = {1, 3} while B-A
= {4, 6, 8}.
Complement of a Set
All set under consideration are subset of some large set U called universal set. Given a universal
set U, the complement of A, written A', is the set of all elements under consideration that are not
in A.

22 | P a g e
Formally, let A be a subset of universal set U. The complement of A in U is
A' = A - U
OR
A' = {x : x U and x A}.
For any set A U, we have following laws
i. A'' = A
ii. A A' = .
iii. A A' = U
Symmetric difference
Let A and B be sets. The symmetric difference of A and B is
A B = { x : x A or x B but not both}
Therefore,
A B = (A B) - (A B)
As an example, consider the following two sets A = {1, 2, 3} and B = {2, 4, 6, 8}. The
symmetric difference, A B = {1, 3, 4, 6, 8}.
Sequences
A sequence of objects is a list of objects in some order. For example, the sequence 7, 21, 57
would be written as (7, 21, 57). In a set the order does not matter but in a sequence it does.
Hence, (7, 21, 57) {57, 7, 21} But (7, 21, 57) = {57, 7, 21}.
Repetition is not permitted in a set but repetition is permitted in a sequence. So, (7, 7, 21, 57) is
different from {7, 21, 57}.

23 | P a g e
Tuples
Finite sequence often are called tuples. For example,
(7, 21) 2-tuple or pair
(7, 21, 57) 3-tuple
(7, 21, ..., k ) k-tuple
An ordered pair of two elements a and b is denoted (a, b) and can be defined as (a, b) = (a, {a,
b}).
Cartesian Product or Cross Product
If A and B are two sets, the cross product of A and B, written A×B, is the set of all pairs wherein
the first element is a member of the set A and the second element is a member of the set B.
Formally,
A×B = {(a, b) : a A, b B}.
For example, let A = {1, 2} and B = {x, y, z}. Then A×B = {(1, x), (1, y), (1, z), (2, x), (2, y), (2,
z)}.
When A and B are finite sets, the cardinality of their product is
|A×B| = |A| . |B|
n-tuples
The cartesian product of n sets A1, A2, ..., An is the set of n-tuples
A1 × A2 × ... × An = {(a1, ..., an) : ai Ai, i = 1, 2, ..., n}
whose cardinality is
| A1 × A2 × ... × An| = |A1| . |A2| ... |An|
If all sets are finite. We denote an n-fold cartesian product over a single set A by the set
An
= A × A × ... × A
whose cardinality is
|An
| = | A|n
if A is finite.

24 | P a g e
Linear Inequalities and Linear Equations
Inequalities
The term inequality is applied to any statement involving one of the symbols <, >, , .
Example of inequalities are:
i. x 1
ii. x + y + 2z > 16
iii. p2
+ q2
1/2
iv. a2
+ ab > 1
Fundamental Properties of Inequalities
1. If a b and c is any real number, then a + c b + c.
For example, -3 -1 implies -3+4 -1 + 4.
2. If a b and c is positive, then ac bc.
For example, 2 3 implies 2(4) 3(4).
3. If a b and c is negative, then ac bc.
For example, 3 9 implies 3(-2) 9(-2).
4. If a b and b c, then a c.
For example, -1/2 2 and 2 8/3 imply -1/2 8/3.
Solution of Inequality
By solution of the one variable inequality 2x + 3 7 we mean any number which substituted for x
yields a true statement.
For example, 1 is a solution of 2x + 3 7 since 2(1) + 3 = 5 and 5 is less than and equal to 7.
By a solution of the two variable inequality x - y 5 we mean any ordered pair of numbers which
when substituted for x and y, respectively, yields a true statement.
For example, (2, 1) is a solution of x - y 5 because 2-1 = 1 and 1 5.
By a solution of the three variable inequality 2x - y + z 3 we means an ordered triple of number
which when substituted for x, y and z respectively, yields a true statement.
For example, (2, 0, 1) is a solution of 2x - y + z 3.

25 | P a g e
A solution of an inequality is said to satisfy the inequality. For example, (2, 1) is satisfy x - y 5.
Two or more inequalities, each with the same variables, considered as a unit, are said to form a
system of inequalities. For example,
x 0
y 0
2x + y 4
Note that the notion of a system of inequalities is analogous to that of a solution of a system of
equations.
Any solution common to all of the inequalities of a system of inequalities is said to be a solution
of that system of inequalities. A system of inequalities, each of whose members is linear, is said
to be a system of linear inequalities.
Geometric Interpretation of Inequalities
An inequality in two variable x and y describes a region in the x-y plane (called its graph),
namely, the set of all points whose coordinates satisfy the inequality.
The y-axis divide, the xy-plane into two regions, called half-planes.
 Right half-plane
The region of points whose coordinates satisfy inequality x > 0.
 Left half-plane
The region of points whose coordinates satisfy inequality x < 0.
Similarly, the x-axis divides the xy-plane into two half-planes.
 Upper half-plane
In which inequality y > 0 is true.
 Lower half-plane
In which inequality y < 0 is true.
What is x-axis and y-axis? They are simply lines. So, the above arguments can be applied to any
line.
Every line ax + by = c divides the xy-plane into two regions called its half-planes.
 On one half-plane ax + by > c is true.
 On the other half-plane ax + by < c is true.

26 | P a g e
Linear Equations
One Unknown
A linear equation in one unknown can always be stated into the standard form
ax = b
where x is an unknown and a and b are constants. If a is not equal to zero, this equation has a
unique solution
x = b/a
Two Unknowns
A linear equation in two unknown, x and y, can be put into the form
ax + by = c
where x and y are two unknowns and a, b, c are real numbers. Also, we assume that a and b are
no zero.
Solution of Linear Equation
A solution of the equation consists of a pair of number, u = (k1, k2), which satisfies the equation
ax + by = c. Mathematically speaking, a solution consists of u = (k1, k2) such that ak1 + bk2 = c.
Solution of the equation can be found by assigning arbitrary values to x and solving for y OR
assigning arbitrary values to y and solving for x.
Geometrically, any solution u = (k1, k2) of the linear equation ax + by = c determine a point in
the cartesian plane. Since a and b are not zero, the solution u correspond precisely to the points
on a straight line.
Two Equations in the Two Unknowns
A system of two linear equations in the two unknowns x and y is
a1x + b1x = c1
a2x + b2x = c2
Where a1, a2, b1, b2 are not zero. A pair of numbers which satisfies both equations is called a
simultaneous solution of the given equations or a solution of the system of equations.
Geometrically, there are three cases of a simultaneous solution

27 | P a g e
1. If the system has exactly one solution, the graph of the linear equations intersect in one
point.
2. If the system has no solutions, the graphs of the linear equations are parallel.
3. If the system has an infinite number of solutions, the graphs of the linear equations
coincide.
The special cases (2) and (3) can only occur when the coefficient of x and y in the two linear
equations are proportional.
OR => a1b2 - a2b1 = 0 => = 0
The system has no solution when
The solution to system
a1x + b1x = c1
a2x + b2x = c2
can be obtained by the elimination process, whereby reduce the system to a single equation in
only one unknown. This is accomplished by the following algorithm
ALGORITHM
Step 1 Multiply the two equation by two numbers which are such that
the resulting coefficients of one of the unknown are negative of
each other.
Step 2 Add the equations obtained in Step 1.

28 | P a g e
The output of this algorithm is a linear equation in one unknown. This equation may be solved
for that unknown, and the solution may be substituted in one of the original equations yielding
the value of the other unknown.
As an example, consider the following system
3x + 2y = 8 ------------ (1)
2x - 5y = -1 ------------ (2)
Step 1: Multiply equation (1) by 2 and equation (2) by -3
6x + 4y = 16
-6x + 15y = 3
Step 2: Add equations, output of Step 1
19y = 19
Thus, we obtain an equation involving only unknown y. we solve for y to obtain
y = 1
Next, we substitute y =1 in equation (1) to get
x = 2
Therefore, x = 2 and y = 1 is the unique solution to the system.
n Equations in n Unknowns
Now, consider a system of n linear equations in n unknowns
a11x1 + a12x2 + . . . + a1nxn = b1
a21x1 + a22x2 + . . . + a2nxn = b2
. . . . . . . . . . . . . . . . . . . . . . . . .
an1x1 + an2x2 + . . . + annxn = bn
Where the aij, bi are real numbers. The number aij is called the coefficient of xj in the ith
equation,
and the number bi is called the constant of the ith equation. A list of values for the unknowns,
x1 = k1, x2 = k2, . . . , xn = kn
or equivalently, a list of n numbers
u = (k1, k2, . . . , kn)
is called a solution of the system if, with kj substituted for xj, the left hand side of each equation
in fact equals the right hand side.

29 | P a g e
The above system is equivalent to the matrix equation.
or, simply we can write A × = B, where A = (aij), × = (xi), and B = (bi).
The matrix is called the coefficient matrix of the system of n linear equations in the system of n
unknown.
The matrix is called the augmented matrix of n linear equations in n unknown.
Note for algorithmic nerds: we store a system in the computer as its augmented matrix.
Specifically, system is stored in computer as an N × (N+1) matrix array A, the augmented matrix
array A, the augmented matrix of the system. Therefore, the constants b1, b2, . . . , bn are
respectively stored as A1,N+1, A2,N+1, . . . , AN,N+1.

30 | P a g e
Solution of a Triangular System
If aij = 0 for i > j, then system of n linear equations in n unknown assumes the triangular form.
a11x1 + a12x2 + . . . + a1,n-1xn-1 + a1nxn = b1
a22x2 + . . . + a2,n-1xn-1 + a2nxn = b2
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
an-2,n-2xn-2 + an-2,n-1xn-1 + an-2,nxn-1 + a2nxn = b2
an-1,n-1xn-1 + an-1,nxn = bn-1
amnxn = bn
Where |A| = a11a22 . . . ann; If none of the diagonal entries a11,a22, . . ., ann is zero, the system has
a unique solution.
Back Substitution Method
we obtain the solution of a triangular system by the technique of back substitution, consider the
above general triangular system.
1. First, we solve the last equation for the last unknown, xn;
xn = bn/ann
2. Second, we substitute the value of xn in the next-to-last equation and solve it for the next-to-
last unknown, xn-1:
.
3. Third, we substitute these values for xn and xn-1 in the third-from-last equation and solve it for
the third-from-last unknown, xn-2 :

31 | P a g e
.
In general, we determine xk by substituting the previously obtained values of xn, xn-1, . . . , xk+1 in
the kth
equation.
.
Gaussian Elimination
Gaussian elimination is a method used for finding the solution of a system of linear equations.
This method consider of two parts.
1. This part consists of step-by-step putting the system into triangular system.
2. This part consists of solving the triangular system by back substitution.
x - 3y - 2z = 6 --- (1)
2x - 4y + 2z = 18 --- (2)
-3x + 8y + 9z = -9 --- (3)
First Part
Eliminate first unknown x from the equations 2 and 3.
(a) multiply -2 to equation (1) and add it to equation (2). Equation (2) becomes
2y + 6z = 6
(b) Multiply 3 to equation (1) and add it to equation (3). Equation (3) becomes
-y + 3z = 9

32 | P a g e
And the original system is reduced to the system
x - 3y - 2z = 6
2y + 6z = 6
-y + 3z = 9
Now, we have to remove the second unknown, y, from new equation 3, using only the new
equation 2 and 3 (above).
a, Multiply equation (2) by 1/2 and add it to equation (3). The equation (3) becomes 6z = 12.
Therefore, our given system of three linear equation of 3 unknown is reduced to the triangular
system
x - 3y - 2z = 6
2y + 6z = 6
6z = 12
Second Part
In the second part, we solve the equation by back substitution and get
x = 1, y = -3, z = 2
In the first stage of the algorithm, the coefficient of x in the first equation is called the pivot, and
in the second stage of the algorithm, the coefficient of y in the second equation is the point.
Clearly, the algorithm cannot work if either pivot is zero. In such a case one must interchange
equation so that a pivot is not zero. In fact, if one would like to code this algorithm, then the
greatest accuracy is attained when the pivot is as large in absolute value as possible. For
example, we would like to interchange equation 1 and equation 2 in the original system in the
above example before eliminating x from the second and third equation.
That is, first step of the algorithm transfer system as

33 | P a g e
2x - 4y + 2z = 18
x - 4y + 2z = 18
-3x + 8y + 9z = -9
Determinants and systems of linear equations
Consider a system of n linear equations in n unknowns. That is, for the following system
a11x1 + a12x2 + . . . + a1nxn = b1
a21x1 + a22x2 + . . . + a2nxn = b2
. . . . . . . . . . . . . . . . . . . . . . . . .
an1x1 + an2x2 + . . . + annxn = bn
Let D denote the determinant of the matrix A +(aij) of coefficients; that is, let D =|A|. Also, let Ni
denote the determinants of the matrix obtained by replacing the ith
column of A by the column of
constants.
Theorem. If D 0, the above system of linear equations has the unique solution
.
This theorem is widely known as Cramer's rule. It is important to note that Gaussian elimination
is usually much more efficient for solving systems of linear equations than is the use of
determinants.

34 | P a g e
ALGORITHMS DESIGN & ANALYSIS TECHNIQUES : Greedy algorithm
Greedy algorithms are simple and straightforward. They are shortsighted in their approach in the
sense that they take decisions on the basis of information at hand without worrying about the
effect these decisions may have in the future. They are easy to invent, easy to implement and
most of the time quite efficient. Many problems cannot be solved correctly by greedy approach.
Greedy algorithms are used to solve optimization problems
Greedy Approach
Greedy Algorithm works by making the decision that seems most promising at any moment; it
never reconsiders this decision, whatever situation may arise later.
As an example consider the problem of "Making Change".
Coins available are:
 dollars (100 cents)
 quarters (25 cents)
 dimes (10 cents)
 nickels (5 cents)
 pennies (1 cent)
Problem Make a change of a given amount using the smallest possible number of coins.
Informal Algorithm
 Start with nothing.
 at every stage without passing the given amount.
o add the largest to the coins already chosen.
Formal Algorithm
Make change for n units using the least possible number of coins.
MAKE-CHANGE (n)
C ← {100, 25, 10, 5, 1} // constant.
Sol ← {}; // set that will hold the solution set.
Sum ← 0 sum of item in solution set

35 | P a g e
WHILE sum not = n
x = largest item in set C such that sum + x ≤ n
IF no such item THEN
RETURN "No Solution"
S ← S {value of x}
sum ← sum + x
RETURN S
Example Make a change for 2.89 (289 cents) here n = 2.89 and the solution contains 2 dollars,
3 quarters, 1 dime and 4 pennies. The algorithm is greedy because at every stage it chooses the
largest coin without worrying about the consequences. Moreover, it never changes its mind in the
sense that once a coin has been included in the solution set, it remains there.
Characteristics and Features of Problems solved by Greedy Algorithms
To construct the solution in an optimal way. Algorithm maintains two sets. One contains chosen
items and the other contains rejected items.
The greedy algorithm consists of four (4) function.
1. A function that checks whether chosen set of items provide a solution.
2. A function that checks the feasibility of a set.
3. The selection function tells which of the candidates is the most promising.
4. An objective function, which does not appear explicitly, gives the value of a solution.
Structure Greedy Algorithm
 Initially the set of chosen items is empty i.e., solution set.
 At each step
o item will be added in a solution set by using selection function.
o IF the set would no longer be feasible
 reject items under consideration (and is never consider again).
o ELSE IF set is still feasible THEN
 add the current item.

36 | P a g e
Definitions of feasibility
A feasible set (of candidates) is promising if it can be extended to produce not merely a solution,
but an optimal solution to the problem. In particular, the empty set is always promising why?
(because an optimal solution always exists)
Unlike Dynamic Programming, which solves the sub problems bottom-up, a greedy strategy
usually progresses in a top-down fashion, making one greedy choice after another, reducing each
problem to a smaller one.
Greedy-Choice Property
The "greedy-choice property" and "optimal substructure" are two ingredients in the problem that
lend to a greedy strategy.
Greedy-Choice Property
It says that a globally optimal solution can be arrived at by making a locally optimal choice.
The greedy Algorithms techniques include:
 Activity Selection Problem
 Minimum Spanning Tree
 Kruskal's Algorithm
 Prim's Algorithm
 Dijkstra's Algorithm
 Huffman's Codes
1. An Activity Selection Problem
An activity-selection is the problem of scheduling a resource among several competing activity.
Problem Statement
Given a set S of n activities with and start time, Si and fi, finish time of an ith
activity. Find the
maximum size set of mutually compatible activities.
Compatible Activities
Activities i and j are compatible if the half-open internal [si, fi) and [sj, fj)
do not overlap, that is, i and j are compatible if si ≥ fj and sj ≥ fi

37 | P a g e
Greedy Algorithm for Selection Problem
I. Sort the input activities by increasing finishing time.
f1 ≤ f2 ≤ . . . ≤ fn
II. Call GREEDY-ACTIVITY-SELECTOR (s, f)
1. n = length [s]
2. A={i}
3. j = 1
4. for i = 2 to n
5. do if si ≥ fj
6. then A= AU{i}
7. j = i
8. return set A
Operation of the algorithm
Let 11 activities are given S = {p, q, r, s, t, u, v, w, x, y, z} start and finished times for proposed
activities are (1, 4), (3, 5), (0, 6), 5, 7), (3, 8), 5, 9), (6, 10), (8, 11), (8, 12), (2, 13) and (12, 14).
A = {p} Initialization at line 2
A = {p, s} line 6 - 1st
iteration of FOR - loop
A = {p, s, w} line 6 -2nd
iteration of FOR - loop
A = {p, s, w, z} line 6 - 3rd
iteration of FOR-loop
Out of the FOR-loop and Return A = {p, s, w, z}
Analysis
Part I requires O(n lg n) time (use merge of heap sort).
Part II requires θ(n) time assuming that activities were already sorted in part I by their finish
time.
Correctness
Note that Greedy algorithm do not always produce optimal solutions but GREEDY-ACTIVITY-
SELECTOR does.

38 | P a g e
Theorem Algorithm GREED-ACTIVITY-SELECTOR produces solution of maximum size for
the activity-selection problem.
Proof Idea Show the activity problem satisfied
I. Greedy choice property.
II. Optimal substructure property.
Proof
I. Let S = {1, 2, . . . , n} be the set of activities. Since activities are in order by finish time. It
implies that activity 1 has the earliest finish time.
Suppose, A S is an optimal solution and let activities in A are ordered by finish time.
Suppose, the first activity in A is k.
If k = 1, then A begins with greedy choice and we are done (or to be very precise, there is
nothing to proof here).
If k 1, we want to show that there is another solution B that begins with greedy choice,
activity 1.
Let B = A - {k} {1}. Because f1 fk, the activities in B are disjoint and since B has
same number of activities as A, i.e., |A| = |B|, B is also optimal.
II. Once the greedy choice is made, the problem reduces to finding an optimal solution for
the problem. If A is an optimal solution to the original problem S, then A` = A - {1} is an
optimal solution to the activity-selection problem S` = {i S: Si fi}.
why? Because if we could find a solution B` to S` with more activities then A`, adding 1
to B` would yield a solution B to S with more activities than A, there by contradicting the
optimality.
As an example consider the example. Given a set of activities to among lecture halls. Schedule
all the activities using minimal lecture halls.
In order to determine which activity should use which lecture hall, the algorithm uses the
GREEDY-ACTIVITY-SELECTOR to calculate the activities in the first lecture hall. If there are

39 | P a g e
some activities yet to be scheduled, a new lecture hall is selected and GREEDY-ACTIVITY-
SELECTOR is called again. This continues until all activities have been scheduled.
LECTURE-HALL-ASSIGNMENT (s, f)
n = length [s)
for i = 1 to n
do HALL [i] = NIL
k = 1
while (Not empty (s))
do HALL [k] = GREEDY-ACTIVITY-SELECTOR (s, t, n)
k = k + 1
return HALL
Following changes can be made in the GREEDY-ACTIVITY-SELECTOR (s, f) (see CLR).
j = first (s)
A = i
for i = j + 1 to n
do if s(i) not= "-"
then if
GREED-ACTIVITY-SELECTOR (s, f, n)
j = first (s)
A = i = j + 1 to n
if s(i] not = "-" then
if s[i] ≥ f[j]|
then A = AU{i}
s[i] = "-"
j = i
return A
Correctness

40 | P a g e
The algorithm can be shown to be correct and optimal. As a contradiction, assume the number of
lecture halls are not optimal, that is, the algorithm allocates more hall than necessary. Therefore,
there exists a set of activities B which have been wrongly allocated. An activity b belonging to B
which has been allocated to hall H[i] should have optimally been allocated to H[k]. This implies
that the activities for lecture hall H[k] have not been allocated optimally, as the GREED-
ACTIVITY-SELECTOR produces the optimal set of activities for a particular lecture hall.
Analysis
In the worst case, the number of lecture halls require is n. GREED-ACTIVITY-SELECTOR runs
in θ(n). The running time of this algorithm is O(n2
).
Two important Observations
 Choosing the activity of least duration will not always produce an optimal solution. For
example, we have a set of activities {(3, 5), (6, 8), (1, 4), (4, 7), (7, 10)}. Here, either (3,
5) or (6, 8) will be picked first, which will be picked first, which will prevent the optimal
solution of {(1, 4), (4, 7), (7, 10)} from being found.
 Choosing the activity with the least overlap will not always produce solution. For
example, we have a set of activities {(0, 4), (4, 6), (6, 10), (0, 1), (1, 5), (5, 9), (9, 10), (0,
3), (0, 2), (7, 10), (8, 10)}. Here the one with the least overlap with other activities is (4,
6), so it will be picked first. But that would prevent the optimal solution of {(0, 1), (1, 5),
(5, 9), (9, 10)} from being found.
2. Minimum Spanning Tree
Spanning Tree
A spanning tree of a graph is any tree that includes every vertex in the graph. Little more
formally, a spanning tree of a graph G is a subgraph of G that is a tree and contains all the
vertices of G. An edge of a spanning tree is called a branch; an edge in the graph that is not in the
spanning tree is called a chord. We construct spanning tree whenever we want to find a simple,
cheap and yet efficient way to connect a set of terminals (computers, cites, factories, etc.).
Spanning trees are important because of following reasons.

41 | P a g e
 Spanning trees construct a sparse sub graph that tells a lot about the original graph.
 Spanning trees a very important in designing efficient routing algorithms.
 Some hard problems (e.g., Steiner tree problem and traveling salesman problem) can be
solved approximately by using spanning trees.
 Spanning trees have wide applications in many areas, such as network design, etc.
Greedy Spanning Tree Algorithm
One of the most elegant spanning tree algorithm that I know of is as follows:
 Examine the edges in graph in any arbitrary sequence.
 Decide whether each edge will be included in the spanning tree.
Note that each time a step of the algorithm is performed, one edge is examined. If there is only a
finite number of edges in the graph, the algorithm must halt after a finite number of steps. Thus,
the time complexity of this algorithm is clearly O(n), where n is the number of edges in the
graph.
Some important facts about spanning trees are as follows:
 Any two vertices in a tree are connected by a unique path.
 Let T be a spanning tree of a graph G, and let e be an edge of G not
in T. The T+e contains a unique cycle.
Lemma The number of spanning trees in the complete graph Kn is nn-2
.
Greediness It is easy to see that this algorithm has the property that each edge is examined at
most once. Algorithms, like this one, which examine each entity at most once and decide its fate
once and for all during that examination are called greedy algorithms. The obvious advantage of
greedy approach is that we do not have to spend time reexamining entities.

42 | P a g e
Consider the problem of finding a spanning tree with the smallest possible weight or the largest
possible weight, respectively called a minimum spanning tree and a maximum spanning tree. It is
easy to see that if a graph possesses a spanning tree, it must have a minimum spanning tree and
also a maximum spanning tree. These spanning trees can be constructed by performing the
spanning tree algorithm (e.g., above mentioned algorithm) with an appropriate ordering of the
edges.
Minimum Spanning Tree Algorithm
Perform the spanning tree algorithm (above) by examining the edges is order of non
decreasing weight (smallest first, largest last). If two or more edges have the same weight, order
them arbitrarily.
Maximum Spanning Tree Algorithm
Perform the spanning tree algorithm (above) by examining the edges in order of non
increasing weight (largest first, smallest last). If two or more edges have the same weight, order
them arbitrarily.
Minimum Spanning Trees
A minimum spanning tree (MST) of a weighted graph G is a spanning tree of G whose edges
sum is minimum weight. In other words, a MST is a tree formed from a subset of the edges in a
given undirected graph, with two properties:
 it spans the graph, i.e., it includes every vertex of the graph.
 it is a minimum, i.e., the total weight of all the edges is as low as possible.
Let G=(V, E) be a connected, undirected graph where V is a set of vertices (nodes) and E is the
set of edges. Each edge has a given non negative length.
Problem Find a subset T of the edges of G such that all the vertices remain connected when
only the edges T are used, and the sum of the lengths of the edges in T is as small as possible.
Let G` = (V, T) be the partial graph formed by the vertices of G and the edges in T. [Note: A
connected graph with n vertices must have at least n-1 edges AND more that n-1 edges implies at
least one cycle]. So n-1 is the minimum number of edges in the T. Hence if G` is connected and
T has more that n-1 edges, we can remove at least one of these edges without disconnecting
(choose an edge that is part of cycle). This will decrease the total length of edges in T.

43 | P a g e
G` = (V, T) where T is a subset of E. Since connected graph of n nodes must have n-1 edges
otherwise there exist at least one cycle. Hence if G` is connected and T has more that n-1 edges.
Implies that it contains at least one cycle. Remove edge from T without disconnecting the G`
(i.e., remove the edge that is part of the cycle). This will decrease the total length of the edges in
T. Therefore, the new solution is preferable to the old one.
Thus, T with n vertices and more edges can be an optimal solution. It follow T must have n-1
edges and since G` is connected it must be a tree. The G` is called Minimum Spanning Tree
(MST).
3. Kruskal's Algorithm
This minimum spanning tree algorithm was first described by Kruskal in 1956 in the same paper
where he rediscovered Jarnik's algorithm. This algorithm was also rediscovered in 1957 by
Loberman and Weinberger, but somehow avoided being renamed after them. The basic idea of
the Kruskal's algorithms is as follows: scan all edges in increasing weight order; if an edge is
safe, keep it (i.e. add it to the set A).
Overall Strategy
Kruskal's Algorithm, as described in CLRS, is directly based on the generic MST algorithm. It
builds the MST in forest. Initially, each vertex is in its own tree in forest. Then, algorithm
consider each edge in turn, order by increasing weight. If an edge (u, v) connects two different
trees, then (u, v) is added to the set of edges of the MST, and two trees connected by an edge (u,
v) are merged into a single tree on the other hand, if an edge (u, v) connects two vertices in the
same tree, then edge (u, v) is discarded.
A little more formally, given a connected, undirected, weighted graph with a function w : E → R.
 Starts with each vertex being its own component.
 Repeatedly merges two components into one by choosing the light edge that connects
them (i.e., the light edge crossing the cut between them).
 Scans the set of edges in monotonically increasing order by weight.
 Uses a disjoint-set data structure to determine whether an edge connects vertices in
different components.
Data Structure

44 | P a g e
Before formalizing the above idea, lets quickly review the disjoint-set data structure from
Chapter 21.
 Make_SET(v): Create a new set whose only member is pointed to by v. Note that for
this operation v must already be in a set.
 FIND_SET(v): Returns a pointer to the set containing v.
 UNION(u, v): Unites the dynamic sets that contain u and v into a new set that is union
of these two sets.
Algorithm
Start with an empty set A, and select at every stage the shortest edge that has not been chosen or
rejected, regardless of where this edge is situated in the graph.
KRUSKAL(V, E, w)
A ← { } ▷ Set A will ultimately contains the edges of the MST
for each vertex v in V
do MAKE-SET(v)
sort E into nondecreasing order by weight w
for each (u, v) taken from the sorted list
do if FIND-SET(u) = FIND-SET(v)
then A ← A ∪ {(u, v)}
UNION(u, v)
return A
Illustrative Examples
Lets run through the following graph quickly to see how Kruskal's algorithm works on it:
We get the shaded edges shown in the above figure.

45 | P a g e
Edge (c, f) : safe
Edge (g, i) : safe
Edge (e, f) : safe
Edge (c, e) : reject
Edge (d, h) : safe
Edge (f, h) : safe
Edge (e, d) : reject
Edge (b, d) : safe
Edge (d, g) : safe
Edge (b, c) : reject
Edge (g, h) : reject
Edge (a, b) : safe
At this point, we have only one component, so all other edges will be rejected. [We could add a
test to the main loop of KRUSKAL to stop once |V| − 1 edges have been added to A.]
Note Carefully: Suppose we had examined (c, e) before (e, f ). Then would have found (c, e)
safe and would have rejected (e, f ).
Example (CLRS) Step-by-Step Operation of Kurskal's Algorithm.
Step 1. In the graph, the Edge(g, h) is shortest. Either vertex g or vertex h could be
representative. Lets choose vertex g arbitrarily.
Step 2. The edge (c, i) creates the second tree. Choose vertex c as representative for second tree.

46 | P a g e
Step 3. Edge (g, g) is the next shortest edge. Add this edge and choose vertex g as representative.
Step 4. Edge (a, b) creates a third tree.
Step 5. Add edge (c, f) and merge two trees. Vertex c is chosen as the representative.
Step 6. Edge (g, i) is the next next cheapest, but if we add this edge a cycle would be created.
Vertex c is the representative of both.
Step 7. Instead, add edge (c, d).

47 | P a g e
Step 8. If we add edge (h, i), edge(h, i) would make a cycle.
Step 9. Instead of adding edge (h, i) add edge (a, h).
Step 10. Again, if we add edge (b, c), it would create a cycle. Add edge (d, e) instead to complete
the spanning tree. In this spanning tree all trees joined and vertex c is a sole representative.
Analysis
Initialize the set A: O(1)
First for loop: |V| MAKE-SETs
Sort E: O(E lg E)
Second for loop: O(E) FIND-SETs and UNIONs

48 | P a g e
 Assuming the implementation of disjoint-set data structure, already seen in Chapter 21,
that uses union by rank and path compression: O((V + E) α(V)) + O(E lg E)
 Since G is connected, |E| ≥ |V| − 1⇒ O(E α(V)) + O(E lg E).
 α(|V|) = O(lg V) = O(lg E).
 Therefore, total time is O(E lg E).
 |E| ≤ |V|2
⇒lg |E| = O(2 lg V) = O(lg V).
 Therefore, O(E lg V) time. (If edges are already sorted, O(E α(V)), which is almost
linear.)
II Kruskal's Algorithm Implemented with Priority Queue Data Structure
MST_KRUSKAL(G)
for each vertex v in V[G]
do define set S(v) ← {v}
Initialize priority queue Q that contains all edges of G, using the weights as keys
A ← { } ▷ A will ultimately contains the edges of the MST
while A has less than n − 1 edges
do Let set S(v) contains v and S(u) contain u
if S(v) ≠ S(u)
then Add edge (u, v) to A
Merge S(v) and S(u) into one set i.e., union
return A
Analysis
The edge weight can be compared in constant time. Initialization of priority queue takes O(E lg
E) time by repeated insertion. At each iteration of while-loop, minimum edge can be removed in
O(log E) time, which is O(log V), since graph is simple. The total running time is O((V + E) log
V), which is O(E lg V) since graph is simple and connected.

49 | P a g e
4. Prim's Algorithm
This algorithm was first propsed by Jarnik, but typically attributed to Prim. it starts from an
arbitrary vertex (root) and at each stage, add a new branch (edge) to the tree already constructed;
the algorithm halts when all the vertices in the graph have been reached. This strategy is greedy
in the sense that at each step the partial spanning tree is augmented with an edge that is the
smallest among all possible adjacent edges.
MST-PRIM
Input: A weighted, undirected graph G=(V, E, w)
Output: A minimum spanning tree T.
T={}
Let r be an arbitrarily chosen vertex from V.
U = {r}
WHILE | U| < n
DO
Find u in U and v in V-U such that the edge (u, v) is a smallest edge between U-V.
T = TU{(u, v)}
U= UU{v}
Analysis
The algorithm spends most of its time in finding the smallest edge. So, time of the algorithm
basically depends on how do we search this edge.
Straightforward method
Just find the smallest edge by searching the adjacency list of the vertices in V. In this case, each
iteration costs O(m) time, yielding a total running time of O(mn).
Binary heap
By using binary heaps, the algorithm runs in O(m log n).
Fibonacci heap
By using Fibonacci heaps, the algorithm runs in O(m + n log n) time.

50 | P a g e
5. Dijkstra's Algorithm
Dijkstra's algorithm solves the single-source shortest-path problem when all edges have non-
negative weights. It is a greedy algorithm and similar to Prim's algorithm. Algorithm starts at the
source vertex, s, it grows a tree, T, that ultimately spans all vertices reachable from S. Vertices
are added to T in order of distance i.e., first S, then the vertex closest to S, then the next closest,
and so on. Following implementation assumes that graph G is represented by adjacency lists.
DIJKSTRA (G, w, s)
1. INITIALIZE SINGLE-SOURCE (G, s)
2. S ← { } // S will ultimately contains vertices of final shortest-path weights from s
3. Initialize priority queue Q i.e., Q ← V[G]
4. while priority queue Q is not empty do
5. u ← EXTRACT_MIN(Q) // Pull out new vertex
6. S ← S È {u}
// Perform relaxation for each vertex v adjacent to u
7. for each vertex v in Adj[u] do
8. Relax (u, v, w)
Analysis
Like Prim's algorithm, Dijkstra's algorithm runs in O(|E|lg|V|) time.
Example: Step by Step operation of Dijkstra algorithm.
Step1. Given initial graph G=(V, E). All nodes nodes have infinite cost except the source node,
s, which has 0 cost.

51 | P a g e
Step 2. First we choose the node, which is closest to the source node, s. We initialize d[s] to 0.
Add it to S. Relax all nodes adjacent to source, s. Update predecessor (see red arrow in diagram
below) for all nodes updated.
Step 3. Choose the closest node, x. Relax all nodes adjacent to node x. Update predecessors for
nodes u, v and y (again notice red arrows in diagram below).
Step 4. Now, node y is the closest node, so add it to S. Relax node v and adjust its predecessor
(red arrows remember!).

52 | P a g e
Step 5. Now we have node u that is closest. Choose this node and adjust its neighbor node v.
Step 6. Finally, add node v. The predecessor list now defines the shortest path from each node to
the source node, s.
Q as a linear array
EXTRACT_MIN takes O(V) time and there are |V| such operations. Therefore, a total time for
EXTRACT_MIN in while-loop is O(V2
). Since the total number of edges in all the adjacency list

53 | P a g e
is |E|. Therefore for-loop iterates |E| times with each iteration taking O(1) time. Hence, the
running time of the algorithm with array implementation is O(V2
+ E) = O(V2
).
Q as a binary heap ( If G is sparse)
In this case, EXTRACT_MIN operations takes O(lg V) time and there are |V| such operations.
The binary heap can be build in O(V) time.
Operation DECREASE (in the RELAX) takes O(lg V) time and there are at most such
operations.
Hence, the running time of the algorithm with binary heap provided given graph is sparse is
O((V + E) lg V). Note that this time becomes O(ElgV) if all vertices in the graph is reachable
from the source vertices.
Q as a Fibonacci heap
In this case, the amortized cost of each of |V| EXTRAT_MIN operations if O(lg V).
Operation DECREASE_KEY in the subroutine RELAX now takes only O(1) amortized time for
each of the |E| edges.
As we have mentioned above that Dijkstra's algorithm does not work on the digraph with
negative-weight edges. Now we give a simple example to show that Dijkstra's algorithm
produces incorrect results in this situation. Consider the digraph consists of V = {s, a, b} and E =
{(s, a), (s, b), (b, a)} where w(s, a) = 1, w(s, b) = 2, and w(b, a) = -2.
Dijkstra's algorithm gives d[a] = 1, d[b] = 2. But due to the negative-edge weight w(b, a), the
shortest distance from vertex s to vertex a is 1-2 = -1.

54 | P a g e
ALGORITHMS DESIGN & ANALYSIS TECHNIQUES: Divide & Conquer
Algorithm
Divide-and-conquer is a top-down technique for designing algorithms that consists of dividing
the problem into smaller subproblems hoping that the solutions of the subproblems are easier to
find and then composing the partial solutions into the solution of the original problem.
Little more formally, divide-and-conquer paradigm consists of following major phases:
 Breaking the problem into several sub-problems that are similar to the original problem
but smaller in size,
 Solve the sub-problem recursively (successively and independently), and then
 Combine these solutions to sub problems to create a solution to the original problem.
Binary Search (simplest application of divide-and-conquer)
Binary Search is an extremely well-known instance of divide-and-conquer paradigm. Given an
ordered array of n elements, the basic idea of binary search is that for a given element we
"probe" the middle element of the array. We continue in either the lower or upper segment of the
array, depending on the outcome of the probe until we reached the required (given) element.
Problem Let A[1 . . . n] be an array of non-decreasing sorted order; that is A [i] ≤ A [j]
whenever 1 ≤ i ≤ j ≤ n. Let 'q' be the query point. The problem consist of finding 'q' in the
array A. If q is not in A, then find the position where 'q' might be inserted.
Formally, find the index i such that 1 ≤ i ≤ n+1 and A[i-1] < x ≤ A[i].
Sequential Search
Look sequentially at each element of A until either we reach at the end of an array A or find an
item no smaller than 'q'.
Sequential search for 'q' in array A

55 | P a g e
for i = 1 to n do
if A [i] ≥ q then
return index i
return n + 1
Analysis
This algorithm clearly takes a θ(r), where r is the index returned. This is Ω(n) in the worst case
and O(1) in the best case.
If the elements of an array A are distinct and query point q is indeed in the array then loop
executed (n + 1) / 2 average number of times. On average (as well as the worst case), sequential
search takes θ(n) time.
Binary Search
Look for 'q' either in the first half or in the second half of the array A. Compare 'q' to an element
in the middle, n/2 , of the array. Let k = n/2 . If q ≤ A[k], then search in the A[1 . . . k];
otherwise search T[k+1 . . n] for 'q'. Binary search for q in subarray A[i . . j] with the promise
that
A[i-1] < x ≤ A[j]
If i = j then
return i (index)
k= (i + j)/2
if q ≤ A [k]
then return Binary Search [A [i-k], q]
else return Binary Search [A[k+1 . . j], q]
Analysis
Binary Search can be accomplished in logarithmic time in the worst case , i.e., T(n) = θ(log n).
This version of the binary search takes logarithmic time in the best case.
Iterative Version of Binary Search
Interactive binary search for q, in array A[1 . . n]

56 | P a g e
if q > A [n]
then return n + 1
i = 1;
j = n;
while i < j do
k = (i + j)/2
if q ≤ A [k]
then j = k
else i = k + 1
return i (the index)
Analysis
The analysis of Iterative algorithm is identical to that of its recursive counterpart.

57 | P a g e
ALGORITHMS DESIGN & ANALYSIS TECHNIQUES: Dynamic Programming
Algorithm
Dynamic programming is a fancy name for using divide-and-conquer technique with a table. As
compared to divide-and-conquer, dynamic programming is more powerful and subtle design
technique. It is not a specific algorithm, but it is a meta-technique (like divide-and-conquer).
This technique was developed back in the days when "programming" meant "tabular method"
(like linear programming).
It does not really refer to computer programming. Here in our advanced algorithm course, we'll
also think of "programming" as a "tableau method" and certainly not writing code. Dynamic
programming is a stage-wise search method suitable for optimization problems whose solutions
may be viewed as the result of a sequence of decisions. The most attractive property of this
strategy is that during the search for a solution it avoids full enumeration by pruning early partial
decision solutions that cannot possibly lead to optimal solution. In many practical situations, this
strategy hits the optimal solution in a polynomial number of decision steps. However, in the
worst case, such a strategy may end up performing full enumeration.
Dynamic programming takes advantage of the duplication and arrange to solve each subproblem
only once, saving the solution (in table or in a globally accessible place) for later use. The
underlying idea of dynamic programming is: avoid calculating the same stuff twice, usually by
keeping a table of known results of subproblems. Unlike divide-and-conquer, which solves the
subproblems top-down, a dynamic programming is a bottom-up technique. The dynamic
programming technique is related to divide-and-conquer, in the sense that it breaks problem
down into smaller problems and it solves recursively. However, because of the somewhat
different nature of dynamic programming problems, standard divide-and-conquer solutions are
not usually efficient.
The dynamic programming is among the most powerful for designing algorithms for
optimization problem. This is true for two reasons. Firstly, dynamic programming solutions are
based on few common elements. Secondly, dynamic programming problems are typical
optimization problems i.e., find the minimum or maximum cost solution, subject to various
constraints.
In other words, this technique used for optimization problems:
 Find a solution to the problem with the optimal value.

58 | P a g e
 Then perform minimization or maximization. (We'll see example of both in CLRS).
The dynamic programming is a paradigm of algorithm design in which an optimization problem
is solved by a combination of caching subproblem solutions and appealing to the "principle of
optimality."
There are three basic elements that characterize a dynamic programming algorithm:
1. Substructure
Decompose the given problem into smaller (and hopefully simpler) subproblems. Express the
solution of the original problem in terms of solutions for smaller problems. Note that unlike
divide-and-conquer problems, it is not usually sufficient to consider one decomposition, but
many different ones.
2. Table-Structure
After solving the subproblems, store the answers (results) to the subproblems in a table. This is
done because (typically) subproblem solutions are reused many times, and we do not want to
repeatedly solve the same problem over and over again.
3. Bottom-up Computation
Using table (or something), combine solutions of smaller subproblems to solve larger
subproblems, and eventually arrive at a solution to the complete problem. The idea of bottom-up
computation is as follow:
Bottom-up means
i. Start with the smallest subproblems.
ii. Combining theirs solutions obtain the solutions to subproblems of increasing size.
iii. Until arrive at the solution of the original problem.
Once we decided that we are going to attack the given problem with dynamic programming
technique, the most important step is the formulation of the problem. In other words, the most
important question in designing a dynamic programming solution to a problem is how to set up
the subproblem structure.
If I can't apply dynamic programming to all optimization problem, then the question is what
should I look for to apply this technique? Well! the answer is there are two important elements

59 | P a g e
that a problem must have in order for dynamic programming technique to be applicable (look for
those!).
1. Optimal Substructure
Show that a solution to a problem consists of making a choice, which leaves one or sub-problems
to solve. Now suppose that you are given this last choice to an optimal solution. [Students often
have trouble understanding the relationship between optimal substructure and determining which
choice is made in an optimal solution. One way to understand optimal substructure is to imagine
that "God" tells you what was the last choice made in an optimal solution.] Given this choice,
determine which subproblems arise and how to characterize the resulting space of subproblems.
Show that the solutions to the subproblems used within the optimal solution must themselves be
optimal (optimality principle). You usually use cut-and-paste:
 Suppose that one of the subproblem is not optimal.
 Cut it out.
 Paste in an optimal solution.
 Get a better solution to the original problem. Contradicts optimality of problem solution.
That was optimal substructure.
You need to ensure that you consider a wide enough range of choices and subproblems that you
get them all . ["God" is too busy to tell you what that last choice really was.] Try all the choices,
solve all the subproblems resulting from each choice, and pick the choice whose solution, along
the subproblem solutions, is best.
We have used "Optimality Principle" a couple of times. Now a word about this beast: The
optimal solution to the problem contains within it optimal solutions to subproblems. This is some
times called the principle of optimality.
The Principle of Optimality
The dynamic programming relies on a principle of optimality. This principle states that in an
optimal sequence of decisions or choices, each subsequence must also be optimal. For example,
in matrix chain multiplication problem, not only the value we are interested in is optimal but all
the other entries in the table are also represent optimal. The principle can be related as follows:
the optimal solution to a problem is a combination of optimal solutions to some of its

60 | P a g e
subproblems. The difficulty in turning the principle of optimally into an algorithm is that it is not
usually obvious which subproblems are relevant to the problem under consideration.
Now the question is how to characterize the space of subproblems?
 Keep the space as simple as possible.
 Expand it as necessary.
As an example, consider the assembly-line scheduling. In this problem, space of subproblems
was fastest way from factory entry through stations S1, j and S2, j. Clearly, no need to try a more
general space of subproblems. On the hand, in case of optimal binary search trees. Suppose we
had tried to constrain space of subproblems to subtrees with keys k1, k2, . . . , kj. An optimal BST
would have root kr , for some 1 ≤ r ≤ j. Get subproblems k1, . . . , kr − 1 and kr + 1, . . . , kj. Unless
we could guarantee that r = j, so that subproblem with kr + 1, . . . , kj is empty, then this
subproblem is not of the form k1, k2, . . . , kj. Thus, needed to allow the subproblems to vary at
both ends, i.e., allow both i and j to vary.
Optimal substructure varies across problem domains:
1. How many subproblems are used in an optimal solution.
2. How many choices in determining which subproblem(s) to use.
In Assembly-line Scheduling Problem: we have 1 subproblem and 2 choices (for Si, j use either S1,
j − 1 or S2, j − 1). In the Longest Common Subsequence Problem: we have 1 subproblem but as far
as choices are concern, we have either 1 choice (if xi = yj , LCS of Xi − 1 and Yj − 1), or 2 choices
(if xi = yj , LCS of Xi − 1 and Y , and LCS of X and Yj − 1). Finally, in case of the Optimal Binary
Search Tree Problem: we have 2 subproblems (ki , . . . , kr − 1 and kr + 1, . . . , kj ) and j − i + 1
choices for kr in ki, . . . , kj . Once we determine optimal solutions to subproblems, we choose
from among the j − i + 1 candidates for kr .
Informally, the running time of the dynamic programming algorithm depends on the overall
number of subproblems times the number of choices. For example, in the assembly-line
scheduling problem, there are Θ(n) subproblems and 2 choices for each implying running time is
Θ(n). In case of longest common subsequence problem, there are Θ(mn) subproblems and at least

61 | P a g e
2 choices for each implying Θ(mn) running time. Finally, in case of optimal binary search tree
problem, we have Θ(n2
) sub-problems and Θ(n) choices for each implying Θ(n3
) running time.
Dynamic programming uses optimal substructure bottom up fashion:
 First find optimal solutions to subproblems.
 Then choose which to use in optimal solution to the problem.
When we look at greedy algorithms, we'll see that they work in top down fashion:
 First make a choice that looks best.
 Then solve the resulting subproblem.
Warning! Its not correct into thinking optimal substructure applies to all optimization problems.
IT DOES NOT. dynamic programming is not applicable to all optimization problems.
In both problems, they gave us an unweighted, directed graph G = (V, E). And our job is to find a
path (sequence of connected edges) from vertex u in V to vertex v in V.
Subproblems Dependencies
It is easy to see that the subproblems, in our above examples, are independent subproblems: For
example, in the assembly line problem, there is only 1 subproblem so it is trivially independent.
Similarly, in the longest common subsequence problem, again we have only 1 subproblem thus it
is automatically independent. On the other hand, in the optimal binary search tree problem, we
have two subproblems, ki, . . . , kr − 1 and kr + 1, . . . , kj, which are clearly independent.
2. Polynomially many (Overlapping) Subproblems
An important aspect to the efficiency of dynamic programming is that the total number of
distinct sub-problems to be solved should be at most a polynomial number. Overlapping
subproblems occur when recursive algorithm revisits the same problem over and over. A good
divide-and-conquer algorithm, for example the merge-sort algorithm, usually generate a brand
new problem at each stage of recursion. Our Textbook CLRS has a good example for matrix-
chain multiplication to depict this idea. The CLRS also talked about the alternative approach so-
called memoization. It works as follows:

62 | P a g e
 Store, don't recompute
 Make a table indexed by subproblem.
 When solving a subproblem:
o Lookup in the table.
o If answer is there, use it.
o Otherwise, compute answer, then store it.
In dynamic programming, we go one step further. We determine in what order we would want to
access the table, and fill it in that way.
Four-Step Method of CLRS
Our Text suggested that the development of a dynamic programming algorithm can be broken
into a sequence of following four steps.
1. Characterize the structure of an optimal solution.
2. Recursively defined the value of an optimal solution.
3. Compute the value of an optimal solution in a bottom-up fashion.
4. Construct an optimal solution from computed information.
Examples of Dynamic programming Algorithm:
 Matrix-chain Multiplication
 Knapsack Problem DP Solution
 Activity Selection Problem DP Solution
1. Matrix-chain Multiplication Problem
The chain matrix multiplication problem is perhaps the most popular example of dynamic
programming used in the upper undergraduate course (or review basic issues of dynamic
programming in advanced algorithm's class).
The chain matrix multiplication problem involves the question of determining the optimal
sequence for performing a series of operations. This general class of problem is important in
complier design for code optimization and in databases for query optimization. We will study the
problem in a very restricted instance, where the dynamic programming issues are clear. Suppose
that our problem is to multiply a chain of n matrices A1 A2 ... An. Recall (from your discrete

63 | P a g e
structures course), matrix multiplication is an associative but not a commutative operation. This
means that you are free to parenthesize the above multiplication however we like, but we are not
free to rearrange the order of the matrices. Also, recall that when two (non-square) matrices are
being multiplied, there are restrictions on the dimensions.
Suppose, matrix A has p rows and q columns i.e., the dimension of matrix A is p × q. You can
multiply a matrix A of p × q dimensions times a matrix B of dimensions q × r, and the result will
be a matrix C with dimensions p × r. That is, you can multiply two matrices if they are
compatible: the number of columns of A must equal the number of rows of B.
In particular, for 1 ≤ i ≤ p and 1 ≤ j ≤ r, we have
C[i, j] = ∑1 ≤ k ≤ q A[i, k] B[k, j].
There are p . r total entries in C and each takes O(q) time to compute, thus the total time to
multiply these two matrices is dominated by the number of scalar multiplication, which is p . q .
r.
Problem Formulation
Note that although we can use any legal parenthesization, which will lead to a valid result. But,
not all parenthesizations involve the same number of operations. To understand this point,
consider the problem of a chain A1, A2, A3 of three matrices and suppose
A1 be of dimension 10 × 100
Then,
MultCost[((A1 A2) A3)] = (10 . 100 . 5) + (10 . 5 . 50) = 7,500 scalar multiplications.
MultCost[(A1 (A2 A3))] = (100 . 5 . 50) + (10 . 100 . 50) = 75,000 scalar multiplications.
It is easy to see that even for this small example, computing the product according to first
parenthesization is 10 times faster.
The Chain Matrix Multiplication Problem
Given a sequence of n matrices A1, A2, ... An, and their dimensions p0, p1, p2, ..., pn, where where i
= 1, 2, ..., n, matrix Ai has dimension pi − 1 × pi, determine the order of multiplication that
minimizes the the number of scalar multiplications.

64 | P a g e
Equivalent formulation (perhaps more easy to work with!)
Given n matrices, A1, A2, ... An, where for 1 ≤ i ≤ n, Ai is a pi − 1 × pi, matrix, parenthesize the
product A1, A2, ... An so as to minimize the total cost, assuming that the cost of multiplying an pi −
1× pi matrix by a pi × pi + 1 matrix using the naive algorithm is pi − 1× pi × pi + 1.
Note that this algorithm does not perform the multiplications, it just figures out the best order in
which to perform the multiplication operations.
Naive Algorithm
Well, lets start from the obvious! Suppose we are given a list of n matrices. lets attack the
problem with brute-force and try all possible parenthesizations. It is easy to see that the number
of ways of parenthesizing an expression is very large. For instance, if you have just one item in
the list, then there is only one way to parenthesize. Similarly, if you have n item in the list, then
there are n − 1 places where you could split the list with the outermost pair of parentheses,
namely just after first item, just after the second item, and so on and so forth, and just after the (n
− 1)th
item in the list.
On the other hand, when we split the given list just after the kth
item, we create two sublists to be
parenthesized, one with k items, and the other with n − k items. After splitting, we could consider
all the ways of parenthesizing these sublists (brute force in action). If there are L ways to
parenthesize the left sublist and R ways to parenthesize the right sublist and since these are
independent choices, then the total is L times R. This suggests the following recurrence for P(n),
the number of different ways of parenthesizing n items:
This recurrence is related to a famous function in combinatorics called the Catalan numbers,
which in turn is related to the number of different binary trees on n nodes. The solution to this
recurrence is the sequence of Catalan numbers. In particular P(n) = C(n − 1), where C(n) is the
nth
Catalan number. And, by applying Stirling's formula, we get the lower bound on the
sequence. That is,

65 | P a g e
since 4n
is exponential and n3/2
is just a polynomial, the exponential will dominate the
expression, implying that function grows very fast. Thus, the number of solutions is exponential
in n, and the brute-force method of exhaustive search is a poor strategy for determining the
optimal parenthesization of a matrix chain. Therefore, the naive algorithm will not be practical
except for very small n.
Dynamic Programming Approach
The first step of the dynamic programming paradigm is to characterize the structure of an
optimal solution. For the chain matrix problem, like other dynamic programming problems,
involves determining the optimal structure (in this case, a parenthesization). We would like to
break the problem into subproblems, whose solutions can be combined to obtain a solution to the
global problem.
For convenience, let us adopt the notation Ai .. j, where i ≤ j, for the result from evaluating the
product Ai Ai + 1 ... Aj. That is,
Ai .. j ≡ Ai Ai + 1 ... Aj , where i ≤ j,
It is easy to see that is a matrix Ai .. j is of dimensions pi × pi + 1.
In parenthesizing the expression, we can consider the highest level of parenthesization. At this
level we are simply multiplying two matrices together. That is, for any k, 1 ≤ k ≤ n − 1,
A1..n = A1..k Ak+1..n .
Therefore, the problem of determining the optimal sequence of multiplications is broken up into
two questions:
Question 1: How do we decide where to split the chain? (What is k?)
Question 2: How do we parenthesize the subchains A1..k Ak+1..n?

66 | P a g e
The subchain problems can be solved by recursively applying the same scheme. On the other
hand, to determine the best value of k, we will consider all possible values of k, and pick the best
of them. Notice that this problem satisfies the principle of optimality, because once we decide to
break the sequence into the product , we should compute each subsequence optimally. That is,
for the global problem to be solved optimally, the subproblems must be solved optimally as well.
The key observation is that the parenthesization of the "prefix" subchain A1..k within this optimal
parenthesization of A1..n. must be an optimal parenthesization of A1..k.
Dynamic Programming Formulation
The second step of the dynamic programming paradigm is to define the value of an optimal
solution recursively in terms of the optimal solutions to subproblems. To help us keep track of
solutions to subproblems, we will use a table, and build the table in a bottomup manner. For 1 ≤ i
≤ j ≤ n, let m[i, j] be the minimum number of scalar multiplications needed to compute the Ai..j.
The optimum cost can be described by the following recursive formulation.
Basis: Observe that if i = j then the problem is trivial; the sequence contains only one matrix, and
so the cost is 0. (In other words, there is nothing to multiply.) Thus,
m[i, i] = 0 for i = 1, 2, ..., n.
Step: If i ≠ j, then we are asking about the product of the subchain Ai..j and we take advantage of
the structure of an optimal solution. We assume that the optimal parenthesization splits the
product, Ai..j into for each value of k, 1 ≤ k ≤ n − 1 as Ai..k . Ak+1..j.
The optimum time to compute is m[i, k], and the optimum time to compute is m[k + 1, j]. We
may assume that these values have been computed previously and stored in our array. Since Ai..k
is a matrix, and Ak+1..j is a matrix, the time to multiply them is pi − 1 . pk . pj. This suggests the
following recursive rule for computing m[i, j].
To keep track of optimal subsolutions, we store the value of k in a table s[i, j]. Recall, k is the
place at which we split the product Ai..j to get an optimal parenthesization. That is,

67 | P a g e
s[i, j] = k such that m[i, j] = m[i, k] + m[k + 1, j] + pi − 1 . pk . pj.
Implementing the Rule
The third step of the dynamic programming paradigm is to construct the value of an optimal
solution in a bottom-up fashion. It is pretty straight forward to translate the above recurrence into
a procedure. As we have remarked in the introduction that the dynamic programming is nothing
but the fancy name for divide-and-conquer with a table. But here in dynamic programming, as
opposed to divide-and-conquer, we solve subproblems sequentially. It means the trick here is to
solve them in the right order so that whenever the solution to a subproblem is needed, it is
already available in the table.
Consequently, in our problem the only tricky part is arranging the order in which to compute the
values (so that it is readily available when we need it). In the process of computing m[i, j] we
will need to access values m[i, k] and m[k + 1, j] for each value of k lying between i and j. This
suggests that we should organize our computation according to the number of matrices in the
subchain. So, lets work on the subchain:
Let L = j − i + 1 denote the length of the subchain being multiplied. The subchains of length 1
(m[i, i]) are trivial. Then we build up by computing the subchains of length 2, 3, ..., n. The final
answer is m[1, n].
Now set up the loop: Observe that if a subchain of length L starts at position i, then j = i + L − 1.
Since, we would like to keep j in bounds, this means we want j ≤ n, this, in turn, means that we
want i + L − 1 ≤ n, actually what we are saying here is that we want i ≤ n − L +1. This gives us
the closed interval for i. So our loop for i runs from 1 to n − L + 1.
Matrix-Chain(array p[1 .. n], int n) {
Array s[1 .. n − 1, 2 .. n];
FOR i = 1 TO n DO m[i, i] = 0; // initialize
FOR L = 2 TO n DO { // L=length of subchain
FOR i = 1 TO n − L + 1 do {
j = i + L − 1;
m[i, j] = infinity;
FOR k = i TO j − 1 DO { // check all splits
q = m[i, k] + m[k + 1, j] + p[i − 1] p[k] p[j];

68 | P a g e
IF (q < m[i, j]) {
m[i, j] = q;
s[i, j] = k;
}
}
}
}
return m[1, n](final cost) and s (splitting markers);
}
Example [on page 337 in CLRS]: The m-table computed by MatrixChain procedure for n = 6
matrices A1, A2, A3, A4, A5, A6 and their dimensions 30, 35, 15, 5, 10, 20, 25.
Note that the m-table is rotated so that the main diagonal runs horizontally. Only the main
diagonal and upper triangle is used.
Complexity Analysis
Clearly, the space complexity of this procedure Ο(n2
). Since the tables m and s require Ο(n2
)
space. As far as the time complexity is concern, a simple inspection of the for-loop(s) structures
gives us a running time of the procedure. Since, the three for-loops are nested three deep, and

Bit 4107 advanced business data structures and computer algorithms

Bit 4107 advanced business data structures and computer algorithms

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Bit 4107 advanced business data structures and computer algorithms

Similar to Bit 4107 advanced business data structures and computer algorithms (20)

Recently uploaded

Recently uploaded (20)

Bit 4107 advanced business data structures and computer algorithms