Mathematical analysis of Graph and Huff amn coding

2,884 views

Published on

Mathematical analysis of Graph and Huff amn coding

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,884
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • <number>
  • Mathematical analysis of Graph and Huff amn coding

    1. 1. Mathematical Analysis of Graph and Huffman Problems Anjan.K II Sem M.Tech CSE M.S.R.I.T 05/22/09 DAA: Analysis of Graph and Huffman Problem 1
    2. 2. Outline  Key Points -Recap  Graph Problems – Minimum Spanning Tree(MST) ◦ Prim-Jarnik Algorithm ◦ Kruskal Algorithm  Data Compression using Huffman Coding ◦ Need For Huffman Codes ◦ Huffman’s Algorithm  Summary  References DAA: Analysis of Graph and Huffman 05/22/09 Problem 2
    3. 3. Key Points – Recap  Algorithmic Complexity depends on many factors, few of major factor are  Underlying Data Structures  Design Strategy used  Pattern of input size  Any algorithm falls to anyone of the efficiency class  Algorithm may fall under three cases best, average and worst asymptotically represented Ω , Θ ,O(Big Oh) and o(Small Oh) respectively DAA: Analysis of Graph and Huffman 05/22/09 Problem 3
    4. 4. Graph Problems-MST  Origin of MST was from Cornerstone problem in combinatorial optimization by Otakar Boruvka.  MST is fundamental Problem with diverse application ◦ Network Design ◦ Cluster Analysis - Bioinformatics ◦ Approximation for NP Hard problems – TSP, Steiner Tree ◦ Indirect Applications – LDPC, Image processing  Minimum Spanning Tree Algorithm ◦ Prim-Jarnik’s Algorithm (Jarnik,Prim,Dijkstra) ◦ Kruskal Algorithm ◦ Boruvka’s Algorithm DAA: Analysis of Graph and Huffman 05/22/09 Problem 4
    5. 5. Prim-Jarnik’s MST  Discovered by three people – Jarnik, Prim and Dijkstra and commonly referred to as Prim’s MST Algorithm.  Employs greedy strategy – “Nearest Neighbor”.  Working Principle: ◦ Given graph G=(V,E), tree starts from an arbitrary vertex ‘r’ and grows until tree spans all the vertices in V. ◦ At each step a vertex ‘s’ joins to its nearest neighbor ‘y’ such that edge(s,y) has smallest weight. One Vertex at time. ◦ Algorithm terminates when all vertices in V is reached. DAA: Analysis of Graph and Huffman 05/22/09 Problem 5
    6. 6. Prim-Jarnik’s Algorithm MST-Prim(G,w,r) 01 Q ← V[G]  //Q – vertices out of T, Q is priority queue 02 for each u ∈ Q  03    key[u] ← ∞ 04 key[r] ← 0 05 π[r] ← NIL 06 while Q ≠ ∅ 07 do u ← EXTRACT-MIN(Q)//making u part of T 08    for each v ∈ Adj[u] 09    do  if v ∈ Q and w(u,v) < key[v] 10  then  π[v] ← u 11            key[v] ← w(u,v) DAA: Analysis of Graph and Huffman 05/22/09 Problem 6
    7. 7. Prim-Jarnik’s – Proof of Correctness Theorem: Upon termination of Algorithm , T is a MST Proof is by induction. Given a spanning tree T if an unique edge f is not in the tree, adding which the tree form a unique cycle and e some edge like f then T = T U {f} – {e}. Proof: A - There exists T’ such that it contains all the edges of T. Basis: T = ∅ => Every MST satisfies Inductive step: A is true at start of iteration Let f be the that is chosen by algorithm If f ∈ T’ then T’ still satisfies A else a cycle is C formed that does not satisfy MST constraint DAA: Analysis of Graph and Huffman 05/22/09 Problem 7
    8. 8. Prim-Jarnik’s Algorithm Analysis  Run time efficiency depends on the how Priority Queue is implemented.  Q is implemented as Binary Heap ◦ Lines 1-5 to perform initialization takes O(V). ◦ EXTRACT-MIN takes total call of O(V log V). ◦ For loop is executed O(E) times.  Total time for algorithm is O(V log V + E log V) = O(E log V) DAA: Analysis of Graph and Huffman 05/22/09 Problem 8
    9. 9. DAA: Analysis of Graph and Huffman 05/22/09 Problem 9
    10. 10. Different Ways To Implement Priority Queue DAA: Analysis of Graph and Huffman 05/22/09 Problem 10
    11. 11. Kruskal’s MST  Discovered by J.B.Kruskal  Employs greedy strategy – “Smallest- Edge- First”.  Working Principle: ◦ Given graph G=(V,E), Sort the edges innon- decreasing order of their weights. ◦ At each step, add an safe edge to forest by examining the order i.e., smallest to largest. One edge at time. ◦ Make sure that forest is connect and there is no isolation. ◦ Algorithm terminates when required n-l edges are present in the forest. DAA: Analysis of Graph and Huffman 05/22/09 Problem 11
    12. 12. Kruskal’s Algorithm MST-Kruskal(G,w) 01 A ← ∅ 02 for each vertex v ∈ V[G] do 03    MAKE-SET(v) 04 sort the edges of E by non-decreasing weight w 05  for  each  edge  (u,v)  ∈  E,  in  order  by  non- decreasing weight 06 do if FIND-SET(u) ≠ FIND-SET(v) 07    then A ← A ∪ {(u,v)} 08         UNION(u,v) 09 return A DAA: Analysis of Graph and Huffman 05/22/09 Problem 12
    13. 13. Kruskal’s – Proof of Correctness Theorem: Upon termination of Algorithm , forest F is a MST Proof is by induction. Given a set of nodes S if an unique edge f is not in the tree, adding which the forest form a unique cycle and e some edge like f then F = S U {f} – {e}. Proof: A - There exists F’ such that it contains all the edges of F. Basis: F = ∅ => Every MST satisfies Inductive step: A is true at start of iteration Let f be the that is chosen by algorithm If f ∈ F’ then F’ still satisfies A else a cycle is C formed that does not satisfy MST constraint DAA: Analysis of Graph and Huffman 05/22/09 Problem 13
    14. 14. Kruskal’s Algorithm Analysis  Run time efficiency depends on the how disjoint set S is implemented.  S is implemented as union-by-rank and path- compression ◦ Time taken to sort the edges is O(E log E) ◦ FIND-SET and UNION OPERATION on S takes O(E) along with MAKE-SET operation running for |V| times. Total of O((V+E) . ß(V)) time ◦ |E| >= |V|-1 therefore O(E . ß(V)) and ß(V) = O(log V)= O(log E) ◦ Then total time for algorithm is O(E log E) ◦ If |E|< |V|² then log |E| = O(log V) hence running time for the algorithm is O(E log V) DAA: Analysis of Graph and Huffman 05/22/09 Problem 14
    15. 15. DAA: Analysis of Graph and Huffman 05/22/09 Problem 15
    16. 16. Data Compression using Huffman Coding  Proposed by Dr. David A. Huffman in 1950’s.  A method for construction of minimum redundancy codes.  Also known as probabilistic Variable length coding.  Used in many compression algorithms like gzip, bzip, jpeg (as option), fax compression.  Properties: ◦ Generates optimal prefix codes ◦ Low cost for generate codes ◦ Low cost on encode and decode ◦ Optimal entropy DAA: Analysis of Graph and Huffman 05/22/09 Problem 16
    17. 17. Information Theory - Entropy  Entropy – measure of Information content  Other forms of entropy are Conditional and English language entropy.  For a set of messages S with probability p(s), s ∈S, the self information of s and entropy H(S) is: 1 1 H ( S ) = ∑p( s) log i ( s) = log = − log p( s) s∈S p ( s) p ( s)  An Example p( S ) = {.25,.25,.25,.125,.125} H ( S ) = 3⋅.25 log 4 + 2⋅.125 log 8 = 2.25 DAA: Analysis of Graph and Huffman 05/22/09 Problem 17
    18. 18. Huffman Algorithm DAA: Analysis of Graph and Huffman 05/22/09 Problem 18
    19. 19. Huffman Coding – Proof of Correctness  Is to prove the optimal code prefixes exhibit greedy-choice and optimal sub- structure property  Proof by Induction ◦ Basis: n= 1, where n is the number of code words, the algorithm finds an optimal code ◦ Inductive Step: We know that it true for n and now consider n+1. ◦ Lemma : Given a tree T, we find T’ with two minimum cost leaves as siblings and C(T’)<=C(T) DAA: Analysis of Graph and Huffman 05/22/09 Problem 19
    20. 20. Proof (Cont’d)  We need to show that C(T’)<=C(X’) where X in any code and T’ and X’ are trees from the lemma.   C(X’)<=C(X)  Next T” and X” are the trees with the minimum cost leaves x and y removed. then C(X’’) = C(X’) – x – y C(T’’) = C(T’) – x – y C(T’’) <= C(X’’) C(T) = C(T’) = C(T’’) + x + y<= C(X’’) + x + y = C(X’) <= C(X) DAA: Analysis of Graph and Huffman 05/22/09 Problem 20
    21. 21. Huffman’s Algorithm Analysis  Run time efficiency depends on the how Queue ‘Q’ is implemented.  Q is implemented as Binary Min Heap then for set C of n characters ◦ Lines 2 to perform initialization takes O(n). ◦ Lines 3-8 is executed n-1 times and each heap operation requires O(log n). ◦ For loop contributes O(n log n) times.  Total time for algorithm is O(g V + E log V) = O(E log V) DAA: Analysis of Graph and Huffman 05/22/09 Problem 21
    22. 22. DAA: Analysis of Graph and Huffman 05/22/09 Problem 22
    23. 23. Summary  Graph Problem- Prim’s and Kruskal’s MST ◦ Data Structure used ◦ Algorithm ◦ Proof of Correctness ◦ Mathematical Analysis  Huffman Coding ◦ Need for Huffman techniques for data compression ◦ Algorithm ◦ Proof of Correctness ◦ Mathematical Analysis DAA: Analysis of Graph and Huffman 05/22/09 Problem 23
    24. 24. References [1] Thomas.H Cormen et.al., “Introduction to Algorithms”,2nd Edition by PHI [2] Anany Levitin, “Design and Analysis of Algorithms” , 2004 Reprint by Pearson Education [3] Sartaj Sahini and Narasingh Deo “Handbook on Data Structures and Applications”, 2005 Reprint by Chanman & Hall [4] Documents and Internet Resources from popular Universities across globe DAA: Analysis of Graph and Huffman 05/22/09 Problem 24

    ×