Your SlideShare is downloading. ×
data structure
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

data structure

1,713
views

Published on

Published in: Technology

0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,713
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Data Structure Revision Tutorial 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 1
  • 2. What this course is about ? • Data structures: conceptual and concrete ways to organize data for efficient storage and efficient manipulation • Employment of this data structures in the design of efficient algorithms 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 2
  • 3. Why do we need them ? • Computers take on more and more complex tasks • Imagine: index of 8 billion pages ! (Google) • Software implementation and maintenance is difficult. • Clean conceptual framework allows for more efficient and more correct code 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 3
  • 4. Why do we need them • Requirements for a good software: • Clean Design • Easy maintenance • Reliable (no core dumps) • Easy to use • Fast algorithms Efficient data structures Efficient algorithms 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 4
  • 5. Example • A collection of 3,000 texts with avg. of 20 lines each, with avg. 10 words / line •  600,000 words • Find all occurrences of the word “happy” • Suppose it takes 1 sec. to check a word for correct matching • What to do? 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 5
  • 6. Example (cont’d) • What to do? Sol. 1 Sequential matching: 1 sec. x 600,000 words = 166 hours Sol. 2 Binary searching: - order the words - search only half at a time Ex. Search 25 in 5 8 12 15 15 17 23 25 27 25 ? 15 15 17 23 25 27 25 ? 23 23 25 27 25 ? 25 How many steps? 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 6
  • 7. Some example data structures • log 2 600000 = 19 sec. vs .166 hours! Set Stack Tree Data structure = representation and operations associated with a data type 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 7
  • 8. Data Structure Philosophy Each data structure has costs and benefits. Rarely is one data structure better than another in all situations. A data structure requires: • space for each data item it stores, • time to perform each basic operation, • programming effort. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 8
  • 9. Data Structure Philosophy (cont) Each problem has constraints on available space and time. Only after a careful analysis of problem characteristics can we know the best data structure for the task. Bank example: • Start account: a few minutes • Transactions: a few seconds • Close account: overnight 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 9
  • 10. What will you learn? • What are some of the common data structures • What are some ways to implement them • How to analyze their efficiency • How to use them to solve practical problems 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 10
  • 11. What you need • Programming experience with C / C++ • Some Java experience may help as well (but not required) • Textbook • Data Structures and Algorithm Analysis in C++ • Mark Allen Weiss • An Unix account to write, compile and run your C/C++ programs 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 11
  • 12. Topics Analysis Tools / ADT Arrays Stacks and Queues Vectors, lists and sequences Trees Heaps / Priority Queues Binary Search Trees – Search Trees Hashing / Dictionaries Sorting Graphs and graph algorithms 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 12
  • 13. Problem Solving: Main Steps 1. Problem definition 2. Algorithm design / Algorithm specification 3. Algorithm analysis 4. Implementation 5. Testing 6. [Maintenance] 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 13
  • 14. 1. Problem Definition • What is the task to be accomplished? • Calculate the average of the grades for a given student • Understand the talks given out by politicians and translate them in Chinese • What are the time / space / speed / performance requirements ? 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 14
  • 15. Problems • Problem: a task to be performed. • Best thought of as inputs and matching outputs. • Problem definition should include constraints on the resources that may be consumed by any acceptable solution. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 15
  • 16. Problems (cont) • Problems  mathematical functions • A function is a matching between inputs (the domain) and outputs (the range). • An input to a function may be single number, or a collection of information. • The values making up an input are called the parameters of the function. • A particular input must always result in the same output every time the function is computed. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 16
  • 17. 2. Algorithm Design / Specifications • Algorithm: Finite set of instructions that, if followed, accomplishes a particular task. • Describe: in natural language / pseudo-code / diagrams / etc. • Criteria to follow: • Input: Zero or more quantities (externally produced) • Output: One or more quantities • Definiteness: Clarity, precision of each instruction • Finiteness: The algorithm has to stop after a finite (may be very large) number of steps • Effectiveness: Each instruction has to be basic enough and feasible • Understand speech • Translate to Chinese 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 17
  • 18. Algorithms and Programs Algorithm: a method or a process followed to solve a problem. • A recipe. An algorithm takes the input to a problem (function) and transforms it to the output. • A mapping of input to output. A problem can have many algorithms. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 18
  • 19. Algorithm Properties An algorithm possesses the following properties: • It must be correct. • It must be composed of a series of concrete steps. • There can be no ambiguity as to which step will be performed next. • It must be composed of a finite number of steps. • It must terminate. A computer program is an instance, or concrete representation, for an algorithm in some programming language. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 19
  • 20. 4,5,6: Implementation, Testing, Maintainance • Implementation • Decide on the programming language to use • C, C++, Lisp, Java, Perl, Prolog, assembly, etc. , etc. • Write clean, well documented code • Test, test, test • Integrate feedback from users, fix bugs, ensure compatibility across different versions  Maintenance 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 20
  • 21. 3. Algorithm Analysis • Space complexity • How much space is required • Time complexity • How much time does it take to run the algorithm • Often, we deal with estimates! 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 21
  • 22. Space Complexity • Space complexity = The amount of memory required by an algorithm to run to completion • *Core dumps = the most often encountered cause is “memory leaks” – the amount of memory required larger than the memory available on a given system] • Some algorithms may be more efficient if data completely loaded into memory • Need to look also at system limitations • E.g. Classify 2GB of text in various categories [politics, tourism, sport, natural disasters, etc.] – can I afford to load the entire collection? 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 22
  • 23. Space Complexity (cont’d) 1. Fixed part: The size required to store certain data/variables, that is independent of the size of the problem: - e.g. name of the data collection - same size for classifying 2GB or 1MB of texts 2. Variable part: Space needed by variables, whose size is dependent on the size of the problem: - e.g. actual text - load 2GB of text VS. load 1MB of text 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 23
  • 24. Space Complexity (cont’d) • S(P) = c + S(instance characteristics) • c = constant • Example: void float sum (float* a, int n) { float s = 0; for(int i = 0; i<n; i++) { s+ = a[i]; } return s; } Space? one word for n, one for a [passed by reference!], one for i  constant space! 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 24
  • 25. Time Complexity • Often more important than space complexity • space available (for computer programs!) tends to be larger and larger • time is still a problem for all of us • 3-4GHz processors on the market • still … • researchers estimate that the computation of various transformations for 1 single DNA chain for one single protein on 1 TerraHZ computer would take about 1 year to run to completion • Algorithms running time is an important issue 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 25
  • 26. Running Time • Problem: prefix averages • Given an array X • Compute the array A such that A[i] is the average of elements X[0] … X*i+, for i=0..n-1 • Sol 1 • At each step i, compute the element X[i] by traversing the array A and determining the sum of its elements, respectively the average • Sol 2 • At each step i update a sum of the elements in the array A • Compute the element X[i] as sum/I Big question: Which solution to choose? 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 26
  • 27. Running time Input 1 ms 2 ms 3 ms 4 ms 5 ms A B C D E F G worst-case best-case }average-case? Suppose the program includes an if-then statement that may execute or not:  variable running time Typically algorithms are measured by their worst case 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 27
  • 28. Experimental Approach • Write a program that implements the algorithm • Run the program with data sets of varying size. • Determine the actual running time using a system call to measure time (e.g. system (date) ); • Problems? 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 28
  • 29. Experimental Approach • It is necessary to implement and test the algorithm in order to determine its running time. • Experiments can be done only on a limited set of inputs, and may not be indicative of the running time for other inputs. • The same hardware and software should be used in order to compare two algorithms. – condition very hard to achieve! 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 29
  • 30. Use a Theoretical Approach • Based on high-level description of the algorithms, rather than language dependent implementations • Makes possible an evaluation of the algorithms that is independent of the hardware and software environments  Generality 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 30
  • 31. Algorithm Description • How to describe algorithms independent of a programming language • Pseudo-Code = a description of an algorithm that is • more structured than usual prose but • less formal than a programming language • (Or diagrams) • Example: find the maximum element of an array. Algorithm arrayMax(A, n): Input: An array A storing n integers. Output: The maximum element in A. currentMax  A[0] for i 1 to n -1 do if currentMax < A[i] then currentMax  A[i] return currentMax 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 31
  • 32. Pseudo Code • Expressions: use standard mathematical symbols • use  for assignment ( ? in C/C++) • use = for the equality relationship (? in C/C++) • Method Declarations: -Algorithm name(param1, param2) • Programming Constructs: • decision structures: if ... then ... [else ..] • while-loops while ... do • repeat-loops: repeat ... until ... • for-loop: for ... do • array indexing: A[i] • Methods • calls: object method(args) • returns: return value • Use comments • Instructions have to be basic enough and feasible! 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 32
  • 33. Low Level Algorithm Analysis • Based on primitive operations (low-level computations independent from the programming language) • E.g.: • Make an addition = 1 operation • Calling a method or returning from a method = 1 operation • Index in an array = 1 operation • Comparison = 1 operation etc. • Method: Inspect the pseudo-code and count the number of primitive operations executed by the algorithm 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 33
  • 34. Example Algorithm arrayMax(A, n): Input: An array A storing n integers. Output: The maximum element in A. currentMax A[0] for i  1 to n -1 do if currentMax < A[i] then currentMax  A[i] return currentMax How many operations ? 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 34
  • 35. Asymptotic Notation • Need to abstract further • Give an “idea” of how the algorithm performs • n steps vs. n+5 steps • n steps vs. n2 steps 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 35
  • 36. Problem • Fibonacci numbers • F[0] = 0 • F[1] = 1 • F[i] = F[i-1] + F[i-2] for i  2 • Pseudo-code • Number of operations 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 36
  • 37. Last Time • Steps in problem solving • Algorithm analysis • Space complexity • Time complexity • Pseudo-code 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 37
  • 38. Algorithm Analysis • Last time: • Experimental approach – problems • Low level analysis – count operations • Abstract even further • Characterize an algorithm as a function of the “problem size” • E.g. • Input data = array  problem size is N (length of array) • Input data = matrix  problem size is N x M 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 38
  • 39. Asymptotic Notation • Goal: to simplify analysis by getting rid of unneeded information (like “rounding” 1,000,001≈1,000,000) • We want to say in a formal way 3n2 ≈ n2 • The “Big-Oh” Notation: • given functions f(n) and g(n), we say that f(n) is O(g(n)) if and only if there are positive constants c and n0 such that f(n)≤ c g(n) for n ≥ n0 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 39
  • 40. Graphic Illustration • f(n) = 2n+6 • Conf. def: • Need to find a function g(n) and a const. c such as f(n) < cg(n) • g(n) = n and c = 4 •  f(n) is O(n) • The order of f(n) is n g (n ) = n c g (n ) = 4 n n f(n) =2n +6 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 40
  • 41. More examples • What about f(n) = 4n2 ? Is it O(n)? • Find a c such that 4n2 < cn for any n > n0 • 50n3 + 20n + 4 is O(n3) • Would be correct to say is O(n3+n) • Not useful, as n3 exceeds by far n, for large values • Would be correct to say is O(n5) • OK, but g(n) should be as closed as possible to f(n) • 3log(n) + log (log (n)) = O( ? ) •Simple Rule: Drop lower order terms and constant factors 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 41
  • 42. Properties of Big-Oh • If f(n) is O(g(n)) then af(n) is O(g(n)) for any a. • If f(n) is O(g(n)) and h(n) is O(g’(n)) then f(n)+h(n) is O(g(n)+g’(n)) • If f(n) is O(g(n)) and h(n) is O(g’(n)) then f(n)h(n) is O(g(n)g’(n)) • If f(n) is O(g(n)) and g(n) is O(h(n)) then f(n) is O(h(n)) • If f(n) is a polynomial of degree d , then f(n) is O(nd) • nx = O(an), for any fixed x > 0 and a > 1 • An algorithm of order n to a certain power is better than an algorithm of order a ( > 1) to the power of n • log nx is O(log n), fox x > 0 – how? • log x n is O(ny) for x > 0 and y > 0 • An algorithm of order log n (to a certain power) is better than an algorithm of n raised to a power y. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 42
  • 43. Asymptotic analysis - terminology • Special classes of algorithms: logarithmic: O(log n) linear: O(n) quadratic: O(n2) polynomial: O(nk), k ≥ 1 exponential: O(an), n > 1 • Polynomial vs. exponential ? • Logarithmic vs. polynomial ? 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 43
  • 44. Some Numbers log n n n log n n2 n3 2n 0 1 0 1 1 2 1 2 2 4 8 4 2 4 8 16 64 16 3 8 24 64 512 256 4 16 64 256 4096 65536 5 32 160 1024 32768 4294967296 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 44
  • 45. Common plots of O( ) O(2n) O(n3 ) O(n2) O(nlogn) O(n) O(√n) O(logn) O(1) 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 45
  • 46. “Relatives” of Big-Oh • “Relatives” of the Big-Oh •  (f(n)): Big Omega – asymptotic lower bound •  (f(n)): Big Theta – asymptotic tight bound • Big-Omega – think of it as the inverse of O(n) • g(n) is  (f(n)) if f(n) is O(g(n)) • Big-Theta – combine both Big-Oh and Big-Omega • f(n) is  (g(n)) if f(n) is O(g(n)) and g(n) is  (f(n)) • Make the difference: • 3n+3 is O(n) and is  (n) • 3n+3 is O(n2) but is not  (n2) 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 46
  • 47. More “relatives” • Little-oh – f(n) is o(g(n)) if for any c>0 there is n0 such that f(n) < c(g(n)) for n > n0. • Little-omega • Little-theta • 2n+3 is o(n2) • 2n + 3 is o(n) ? 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 47
  • 48. Best, Worst, Average Cases Not all inputs of a given size take the same time to run. Sequential search for K in an array of n integers: • Begin at first element in array and look at each element in turn until K is found Best case: Worst case: Average case: 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 48
  • 49. Example Remember the algorithm for computing prefix averages - compute an array A starting with an array X - every element A[i] is the average of all elements X[j] with j < i Remember some pseudo-code … Solution 1 Algorithm prefixAverages1(X): Input: An n-element array X of numbers. Output: An n -element array A of numbers such that A[i] is the average of elements X[0], ... , X[i]. Let A be an array of n numbers. for i 0 to n - 1 do a  0 for j  0 to i do a  a + X[j] A[i]  a/(i+ 1) return array A Analyze this 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 49
  • 50. Example (cont’d) Algorithm prefixAverages2(X): Input: An n-element array X of numbers. Output: An n -element array A of numbers such that A[i] is the average of elements X[0], ... , X[i]. Let A be an array of n numbers. s 0 for i  0 to n do s  s + X[i] A[i]  s/(i+ 1) return array A 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 50
  • 51. Back to the original question • Which solution would you choose? • O(n2) vs. O(n) • Some math … • properties of logarithms: logb(xy) = logbx + logby logb (x/y) = logbx - logby logbxa = alogbx logba= logxa/logxb • properties of exponentials: a(b+c) = aba c abc = (ab)c ab /ac = a(b-c) b = a log a b bc = a c*log a b 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 51
  • 52. Important Series • Sum of squares: • Sum of exponents: • Geometric series: • Special case when A = 2 • 20 + 21 + 22 + … + 2N = 2N+1 - 1 Nlargefor 36 )12)(1( 3 1 2 NNNN i N i   == -1kandNlargefor |1| 1 1     =  k N i kN i k 1 11 0   =  =  A A A NN i i = === N i NNiNNS 1 2/)1(21)(  15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 52
  • 53. Analyzing recursive algorithms function foo (param A, param B) { statement 1; statement 2; if (termination condition) { return; foo(A’, B’); } 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 53
  • 54. Solving recursive equations by repeated substitution T(n) = T(n/2) + c substitute for T(n/2) = T(n/4) + c + c substitute for T(n/4) = T(n/8) + c + c + c = T(n/23) + 3c in more compact form = … = T(n/2k) + kc “inductive leap” T(n) = T(n/2logn) + clogn “choose k = logn” = T(n/n) + clogn = T(1) + clogn = b + clogn = θ(logn) 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 54
  • 55. Solving recursive equations by telescoping T(n) = T(n/2) + c initial equation T(n/2) = T(n/4) + c so this holds T(n/4) = T(n/8) + c and this … T(n/8) = T(n/16) + c and this … … T(4) = T(2) + c eventually … T(2) = T(1) + c and this … T(n) = T(1) + clogn sum equations, canceling the terms appearing on both sides T(n) = θ(logn) 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 55
  • 56. Problem • Running time for finding a number in a sorted array [binary search] • Pseudo-code • Running time analysis 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 56
  • 57. Space/Time Tradeoff Principle One can often reduce time if one is willing to sacrifice space, or vice versa. • Encoding or packing information Boolean flags • Table lookup Factorials Disk-based Space/Time Tradeoff Principle: The smaller you make the disk storage requirements, the faster your program will run. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 57
  • 58. ADT • ADT = Abstract Data Types • A logical view of the data objects together with specifications of the operations required to create and manipulate them. • Describe an algorithm – pseudo-code • Describe a data structure – ADT 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 58
  • 59. What is a data type? • A set of objects, each called an instance of the data type. Some objects are sufficiently important to be provided with a special name. • A set of operations. Operations can be realized via operators, functions, procedures, methods, and special syntax (depending on the implementing language) • Each object must have some representation (not necessarily known to the user of the data type) • Each operation must have some implementation (also not necessarily known to the user of the data type) 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 59
  • 60. What is a representation? • A specific encoding of an instance • This encoding MUST be known to implementors of the data type but NEED NOT be known to users of the data type • Terminology: "we implement data types using data structures“ 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 60
  • 61. Two varieties of data types • Opaque data types in which the representation is not known to the user. • Transparent data types in which the representation is profitably known to the user:- i.e. the encoding is directly accessible and/or modifiable by the user. • Which one you think is better? • What are the means provided by C++ for creating opaque data types? 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 61
  • 62. Why are opaque data types better? • Representation can be changed without affecting user • Forces the program designer to consider the operations more carefully • Encapsulates the operations • Allows less restrictive designs which are easier to extend and modify • Design always done with the expectation that the data type will be placed in a library of types available to all. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 62
  • 63. How to design a data type Step 1: Specification • Make a list of the operations (just their names) you think you will need. Review and refine the list. • Decide on any constants which may be required. • Describe the parameters of the operations in detail. • Describe the semantics of the operations (what they do) as precisely as possible. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 63
  • 64. How to design a data type Step 2: Application • Develop a real or imaginary application to test the specification. • Missing or incomplete operations are found as a side-effect of trying to use the specification. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 64
  • 65. How to design a data type Step 3: Implementation • Decide on a suitable representation. • Implement the operations. • Test, debug, and revise. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 65
  • 66. Example - ADT Integer Name of ADT Integer Operation Description C/C++ Create Defines an identifier with an undefined value int id1; Assign Assigns the value of one integer id1 = id2; identifier or value to another integer identifier isEqual Returns true if the values associated id1 == id2; with two integer identifiers are the same 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 66
  • 67. Example – ADT Integer LessThan Returns true if an identifier integer is less than the value of the second id1<id2 integer identifier Negative Returns the negative of the integer value -id1 Sum Returns the sum of two integer values id1+id2 Operation Signatures Create: identifier  Integer Assign: Integer  Identifier IsEqual: (Integer,Integer)  Boolean LessThan: (Integer,Integer)  Boolean Negative: Integer  Integer Sum: (Integer,Integer)  Integer 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 67
  • 68. More examples • We’ll see more examples throughout the course • Stack • Queue • Tree • And more 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 68
  • 69. Arrays Array: a set of pairs (index and value) data structure For each index, there is a value associated with that index. representation (possible) implemented by using consecutive memory. ©RohitBirlaDataStructure RevisionTutorial 69 15-Oct-2011
  • 70. Objects:Asetofpairs <index, value> where foreachvalueofindex thereisavalue fromthesetitem. Indexisafinite ordered setofoneor moredimensions, for example,,0,…,n-1}foronedimension, {(0,0),(0,1),(0,2),(1,0),(1,1),(1,2),(2,0),(2,1),(2,2)} fortwodimensions, etc. Methods: forall AArray,iindex, xitem,j,sizeinteger ArrayCreate(j, list) ::=returnanarrayof jdimensions where listisa j-tuple whosekthelementisthesizeofthe kthdimension. Itemsareundefined. ItemRetrieve(A, i) ::=if(iindex) returntheitemassociated with index valueiinarray A elsereturnerror ArrayStore(A, i,x) ::= if(iinindex) returnanarraythatisidentical toarray Aexceptthenewpair<i,x>hasbeen inserted elsereturnerror The Array ADT ©RohitBirlaDataStructure RevisionTutorial 70 15-Oct-2011
  • 71. Arrays in C int list[5], *plist[5]; list[5]: five integers list[0], list[1], list[2], list[3], list[4] *plist[5]: five pointers to integers plist[0], plist[1], plist[2], plist[3], plist[4] implementation of 1-D array list[0] base address =  list[1]  + sizeof(int) list[2]  + 2*sizeof(int) list[3]  + 3*sizeof(int) list[4]  + 4*size(int) ©RohitBirlaDataStructure RevisionTutorial 71 15-Oct-2011
  • 72. Arrays in C (cont’d) Compare int *list1 and int list2[5] in C. Same: list1 and list2 are pointers. Difference: list2 reserves five locations. Notations: list2 - a pointer to list2[0] (list2 + i) - a pointer to list2[i] (&list2[i]) *(list2 + i) - list2[i] ©RohitBirlaDataStructure RevisionTutorial 72 15-Oct-2011
  • 73. Address Contents 1228 0 1230 1 1232 2 1234 3 1236 4 Example: int one[] = {0, 1, 2, 3, 4}; //Goal: print out address and value void print1(int *ptr, int rows) { printf(“Address Contentsn”); for (i=0; i < rows; i++) printf(“%8u%5dn”, ptr+i, *(ptr+i)); printf(“n”); } Example ©RohitBirlaDataStructure RevisionTutorial 73 15-Oct-2011
  • 74. ne n e xaxaxp = ...)( 1 1 Polynomials A(X)=3X20+2X5+4, B(X)=X4+10X3+3X2+1 Other Data Structures Based on Arrays •Arrays: •Basic data structure •May store any type of elements Polynomials: defined by a list of coefficients and exponents - degree of polynomial = the largest exponent in a polynomial ©RohitBirlaDataStructure RevisionTutorial 74 15-Oct-2011
  • 75. Polynomial ADT Objects: a set of ordered pairs of <ei,ai> where ai in Coefficients and ei in Exponents, ei are integers >= 0 Methods: for all poly, poly1, poly2  Polynomial, coef Coefficients, expon Exponents Polynomial Zero( ) ::= return the polynomial p(x) = 0 Boolean IsZero(poly) ::= if (poly) return FALSE else return TRUE Coefficient Coef(poly, expon) ::= if (expon  poly) return its coefficient else return Zero Exponent Lead_Exp(poly) ::= return the largest exponent in poly Polynomial Attach(poly,coef, expon) ::= if (expon  poly) return error else return the polynomial poly with the term <coef, expon> inserted ©RohitBirlaDataStructure RevisionTutorial 75 15-Oct-2011
  • 76. Polyomial ADT (cont’d) Polynomial Remove(poly, expon) ::= if (expon  poly) return the polynomial poly with the term whose exponent is expon deleted else return error Polynomial SingleMult(poly, coef, expon)::= return the polynomial poly • coef • xexpon Polynomial Add(poly1, poly2) ::= return the polynomial poly1 +poly2 Polynomial Mult(poly1, poly2) ::= return the polynomial poly1 • poly2 ©RohitBirlaDataStructure RevisionTutorial 76 15-Oct-2011
  • 77. Polynomial Addition (1) #define MAX_DEGREE 101 typedef struct { int degree; float coef[MAX_DEGREE]; } polynomial; Addition(polynomial * a, polynomial * b, polynomial* c) { … } advantage: easy implementation disadvantage: waste space when sparse Running time? ©RohitBirlaDataStructure RevisionTutorial 77 15-Oct-2011
  • 78. • Use one global array to store all polynomials Polynomial Addition (2) 2 1 1 10 3 1 1000 0 4 3 2 0 coef exp starta finisha startb finishb avail 0 1 2 3 4 5 6 A(X)=2X1000+1 B(X)=X4+10X3+3X2+1 ©RohitBirlaDataStructure RevisionTutorial 78 15-Oct-2011
  • 79. Polynomial Addition (2) (cont’d) #define MAX_DEGREE 101 typedef struct { int exp; float coef; } polynomial_term; polynomial_term terms[3*MAX_DEGREE]; Addition(int starta, int enda, int startb, int endb, int startc, int endc) { … } advantage: less space disadvantage: longer code Running time? ©RohitBirlaDataStructure RevisionTutorial 79 15-Oct-2011
  • 80.                       0002800 0000091 000000 006000 0003110 150220015 col1 col2 col3 col4 col5 col6 row0 row1 row2 row3 row4 row5 8/36 6*65*3 15/15 sparse matrix data structure? Sparse Matrices ©RohitBirlaDataStructure RevisionTutorial 80 15-Oct-2011
  • 81. Sparse Matrix ADT Objects: a set of triples, <row, column, value>, where row and column are integers and form a unique combination, and value comes from the set item. Methods: for all a, b  Sparse_Matrix, x  item, i, j, max_col, max_row  index Sparse_Marix Create(max_row, max_col) ::= return a Sparse_matrix that can hold up to max_items = max _row  max_col and whose maximum row size is max_row and whose maximum column size is max_col. ©RohitBirlaDataStructure RevisionTutorial 81 15-Oct-2011
  • 82. Sparse Matrix ADT (cont’d) Sparse_Matrix Transpose(a) ::= return the matrix produced by interchanging the row and column value of every triple. Sparse_Matrix Add(a, b) ::= if the dimensions of a and b are the same return the matrix produced by adding corresponding items, namely those with identical row and column values. else return error Sparse_Matrix Multiply(a, b) ::= if number of columns in a equals number of rows in b return the matrix d produced by multiplying a by b according to the formula: d [i] [j] = (a[i][k]•b[k][j]) where d (i, j) is the (i,j)th element else return error. ©RohitBirlaDataStructure RevisionTutorial 82 15-Oct-2011
  • 83. (1) Represented by a two-dimensional array. Sparse matrix wastes space. (2) Each element is characterized by <row, col, value>. Sparse Matrix Representation Sparse_matrix Create(max_row, max_col) ::= #define MAX_TERMS 101 /* maximum number of terms +1*/ typedef struct { int col; int row; int value; } term; term A[MAX_TERMS] The terms in A should be ordered based on <row, col> ©RohitBirlaDataStructure RevisionTutorial 83 15-Oct-2011
  • 84. Sparse Matrix Operations • Transpose of a sparse matrix. • What is the transpose of a matrix? row col value row col value a[0] 6 6 8 b[0] 6 6 8 [1] 0 0 15 [1] 0 0 15 [2] 0 3 22 [2] 0 4 91 [3] 0 5 -15 [3] 1 1 11 [4] 1 1 11 [4] 2 1 3 [5] 1 2 3 [5] 2 5 28 [6] 2 3 -6 [6] 3 0 22 [7] 4 0 91 [7] 3 2 -6 [8] 5 2 28 [8] 5 0 -15 transpose ©RohitBirlaDataStructure RevisionTutorial 84 15-Oct-2011
  • 85. (1) for each row i take element <i, j, value> and store it in element <j, i, value> of the transpose. difficulty: where to put <j, i, value>? (0, 0, 15) ====> (0, 0, 15) (0, 3, 22) ====> (3, 0, 22) (0, 5, -15) ====> (5, 0, -15) (1, 1, 11) ====> (1, 1, 11) Move elements down very often. (2) For all elements in column j, place element <i, j, value> in element <j, i, value> Transpose a Sparse Matrix ©RohitBirlaDataStructure RevisionTutorial 85 15-Oct-2011
  • 86. Transpose of a Sparse Matrix (cont’d) void transpose (term a[], term b[]) /* b is set to the transpose of a */ { int n, i, j, currentb; n = a[0].value; /* total number of elements */ b[0].row = a[0].col; /* rows in b = columns in a */ b[0].col = a[0].row; /*columns in b = rows in a */ b[0].value = n; if (n > 0) { /*non zero matrix */ currentb = 1; for (i = 0; i < a[0].col; i++) /* transpose by columns in a */ for( j = 1; j <= n; j++) /* find elements from the current column */ if (a[j].col == i) { /* element is in current column, add it to b */ ©RohitBirlaDataStructure RevisionTutorial 86 15-Oct-2011
  • 87. Linked Lists • Avoid the drawbacks of fixed size arrays with • Growable arrays • Linked lists ©RohitBirlaDataStructure RevisionTutorial 87 15-Oct-2011
  • 88. Growable arrays • Avoid the problem of fixed-size arrays • Increase the size of the array when needed (I.e. when capacity is exceeded) • Two strategies: • tight strategy (add a constant): f(N) = N + c • growth strategy (double up): f(N) = 2N ©RohitBirlaDataStructure RevisionTutorial 88 15-Oct-2011
  • 89. Tight Strategy • Add a number k (k = constant) of elements every time the capacity is exceeded 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 C0 + (C0+k) + … (C0+Sk) = S = (N – C0) / k Running time? C0 * S + S*(S+1) / 2  O(N2) ©RohitBirlaDataStructure RevisionTutorial 89 15-Oct-2011
  • 90. Tight Strategy void insertLast(int rear, element o) { if ( size == rear) { capacity += k; element* B = new element[capacity]; for(int i=0; i<size; i++) { B[i] = A[i]; } A = B; } A[rear] = o; rear++; size++; } ©RohitBirlaDataStructure RevisionTutorial 90 15-Oct-2011
  • 91. Growth Strategy • Double the size of the array every time is needed (I.e. capacity exceeded) 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 C0 + (C0 * 2) + (C0*4) + … + (C0*2i) = i = log (N / C0) Running time? C0 *1 + 2 + … + 2 log(N/C0) ]  O(N) How does the previous code change? ©RohitBirlaDataStructure RevisionTutorial 91 15-Oct-2011
  • 92. Linked Lists • Avoid the drawbacks of fixed size arrays with • Growable arrays • Linked lists ©RohitBirlaDataStructure RevisionTutorial 92 15-Oct-2011
  • 93. int i, *pi; float f, *pf; pi = (int *) malloc(sizeof(int)); pf = (float *) malloc (sizeof(float)); *pi =1024; *pf =3.14; printf(”an integer = %d, a float = %fn”, *pi, *pf); free(pi); free(pf); request memory return memory Using Dynamically Allocated Memory (review) ©RohitBirlaDataStructure RevisionTutorial 93 15-Oct-2011
  • 94. bat  cat  sat  vat NULL Linked Lists ©RohitBirlaDataStructure RevisionTutorial 94 15-Oct-2011
  • 95. bat  cat  sat  vat NULL mat  Insertion Compare this with the insertion in arrays! ©RohitBirlaDataStructure RevisionTutorial 95 15-Oct-2011
  • 96. bat  cat  sat  vat NULLmat  dangling reference Deletion ©RohitBirlaDataStructure RevisionTutorial 96 15-Oct-2011
  • 97. List ADT • ADT with position-based methods • generic methods size(), isEmpty() • query methods isFirst(p), isLast(p) • accessor methods first(), last() before(p), after(p) • update methods swapElements(p,q), replaceElement(p,e) insertFirst(e), insertLast(e) insertBefore(p,e), insertAfter(p,e) removeAfter(p) ©RohitBirlaDataStructure RevisionTutorial 97 15-Oct-2011
  • 98. typedefstructnode,*pnode; typedefstructnode{ chardata[4]; pnodenext; }; Creation pnodeptr=NULL; Testing #defineIS_EMPTY(ptr)(!(ptr)) Allocation ptr=(pnode)malloc(sizeof(node)); Declaration Implementation ©RohitBirlaDataStructure RevisionTutorial 98 15-Oct-2011
  • 99. b a t 0 NULL address of first node ptr data ptr link ptr e  name  (*e).name strcpy(ptr  data, “bat”); ptr  link = NULL; Create one Node ©RohitBirlaDataStructure RevisionTutorial 99 15-Oct-2011
  • 100. pnodecreate2() { /*createalinked listwithtwonodes*/ pnodefirst,second; first=(pnode)malloc(sizeof(node)); second=(pnode)malloc(sizeof(node)); second->next=NULL; second->data=20; first->data=10; first->next=second; returnfirst; } 10  20 NULL ptr Example: Create a two-nodes list ©RohitBirlaDataStructure RevisionTutorial 100 15-Oct-2011
  • 101. voidinsertAfter(pnode node,char*data) { /*insertanewnodewithdatainto thelistptrafternode*/ pnodetemp; temp=(pnode)malloc(sizeof(node)); if(IS_FULL(temp)){ fprintf(stderr, “The memoryisfulln”); exit(1); } Insert (after a specific position) ©RohitBirlaDataStructure RevisionTutorial 101 15-Oct-2011
  • 102. strcpy(temp->data, data); if(node){ noemptylist temp->next=node->next; node->next=temp; } else{ emptylist temp->next=NULL; node=temp; } } 50  10  20 NULL temp node ©RohitBirlaDataStructure RevisionTutorial 102 15-Oct-2011
  • 103. 10  20 NULL50  20 NULL50  node trail = NULL node (a) before deletion (b)after deletion Deletion Delete node other than the first node 10  20 NULL50  20 NULL10  head node head ©RohitBirlaDataStructure RevisionTutorial 103 15-Oct-2011
  • 104. voidremoveAfter(pnodenode) { /*deletewhatfollowsafternodeinthelist*/ pnodetmp; if(node){ tmp=node->next; node->next=node->next->next; free(tmp); } } 10  20 NULL50  20 NULL10  node ©RohitBirlaDataStructure RevisionTutorial 104 15-Oct-2011
  • 105. voidtraverseList(pnodeptr) { printf(“The listcontains:“); for(;ptr;ptr=ptr->next) printf(“%4d”, ptr->data); printf(“n”); } Traverse a list Where does ptr point after this function call? ©RohitBirlaDataStructure RevisionTutorial 105 15-Oct-2011
  • 106. Other List Operations • swapElements • insertFirst • insertLast • deleteBefore • deleteLast ©RohitBirlaDataStructure RevisionTutorial 106 15-Oct-2011
  • 107. Running Time Analysis • insertAfter O(?) • deleteAfter O(?) • deleteBeforeO(?) • deleteLast O(?) • insertFirst O(?) • insertLast O(?) ©RohitBirlaDataStructure RevisionTutorial 107 15-Oct-2011
  • 108. Applications of Linked Lists • Stacks and Queues Implemented with Linked Lists • Polynomials Implemented with Linked Lists • Remember the array based implementation? • Hint: two strategies, one efficient in terms of space, one in terms of running time ©RohitBirlaDataStructure RevisionTutorial 108 15-Oct-2011
  • 109. Operations on Linked Lists • Running time? • insert, remove • traverse, swap • How to reverse the elements of a list? ©RohitBirlaDataStructure RevisionTutorial 109 15-Oct-2011
  • 110. typedef struct poly_node *poly_pointer; typedef struct poly_node { int coef; int expon; poly_pointer next; }; poly_pointer a, b, c; A x a x a x a xm e m e em m ( ) ...=       1 2 0 1 2 0 coef expon link Representation Polynomials ©RohitBirlaDataStructure RevisionTutorial 110 15-Oct-2011
  • 111. 3 14 2 8 1 0 a 8 14 -3 10 10 6 b a x x=  3 2 114 8 b x x x=  8 3 1014 10 6 null null Example ©RohitBirlaDataStructure RevisionTutorial 111 15-Oct-2011
  • 112. 3 14 2 8 1 0 a 8 14 -3 10 10 6 b 11 14 d a->expon == b->expon 3 14 2 8 1 0 a 8 14 -3 10 10 6 b 11 14 d a->expon < b->expon-3 10 Adding Polynomials ©RohitBirlaDataStructure RevisionTutorial 112 15-Oct-2011
  • 113. 3 14 2 8 1 0 a 8 14 -3 10 10 6 b 11 14 a->expon > b->expon -3 10 d 2 8 Adding Polynomials (cont’d) ©RohitBirlaDataStructure RevisionTutorial 113 15-Oct-2011
  • 114. poly_pointer padd(poly_pointer a, poly_pointer b) { poly_pointer front, rear, temp; int sum; rear =(poly_pointer)malloc(sizeof(poly_node)); if (IS_FULL(rear)) { fprintf(stderr, “The memory is fulln”); exit(1); } front = rear; while (a && b) { switch (COMPARE(a->expon, b->expon)) { Adding Polynomials (cont’d) ©RohitBirlaDataStructure RevisionTutorial 114 15-Oct-2011
  • 115. case -1: /* a->expon < b->expon */ attach(b->coef, b->expon, &rear); b= b->next; break; case 0: /* a->expon == b->expon */ sum = a->coef + b->coef; if (sum) attach(sum,a->expon,&rear); a = a->next; b = b->next; break; case 1: /* a->expon > b->expon */ attach(a->coef, a->expon, &rear); a = a->next; } } for (; a; a = a->next) attach(a->coef, a->expon, &rear); for (; b; b=b->next) attach(b->coef, b->expon, &rear); rear->next = NULL; temp = front; front = front->next; free(temp); return front; } ©RohitBirlaDataStructure RevisionTutorial 115 15-Oct-2011
  • 116. (1) coefficient additions 0  additions  min(m, n) where m (n) denotes the number of terms in A (B). (2) exponent comparisons extreme case em-1 > fm-1 > em-2 > fm-2 > … > e0 > f0 m+n-1 comparisons (3) creation of new nodes extreme case m + n new nodes summary O(m+n) Analysis ©RohitBirlaDataStructure RevisionTutorial 116 15-Oct-2011
  • 117. void attach(float coefficient, int exponent, poly_pointer *ptr) { /* create a new node attaching to the node pointed to by ptr. ptr is updated to point to this new node. */ poly_pointer temp; temp = (poly_pointer) malloc(sizeof(poly_node)); if (IS_FULL(temp)) { fprintf(stderr, “The memory is fulln”); exit(1); } temp->coef = coefficient; temp->expon = exponent; (*ptr)->next = temp; *ptr = temp; } Attach a Term ©RohitBirlaDataStructure RevisionTutorial 117 15-Oct-2011
  • 118. Other types of lists: • Circular lists • Doubly linked lists ©RohitBirlaDataStructure RevisionTutorial 118 15-Oct-2011
  • 119. 3 14 2 8 1 0 ptr ptr avail ... avail temp circular list vs. chain Circularly linked lists ©RohitBirlaDataStructure RevisionTutorial 119 15-Oct-2011
  • 120. X1  X2  X3 a What happens when we insert a node to the front of a circular linked list? Problem: move down the whole list. Operations in a circular list X1  X2  X3  a Keep a pointer points to the last node. A possible solution: ©RohitBirlaDataStructure RevisionTutorial 120 15-Oct-2011
  • 121. void insertFront (pnode* ptr, pnode node) { /* insert a node in the list with head (*ptr)->next */ if (IS_EMPTY(*ptr)) { *ptr= node; node->next = node; /* circular link */ } else { node->next = (*ptr)->next; (1) (*ptr)->next = node; (2) } } X1  X2  X3  (1) (2) ptr Insertion ©RohitBirlaDataStructure RevisionTutorial 121 15-Oct-2011
  • 122. int length(pnode ptr) { pnode temp; int count = 0; if (ptr) { temp = ptr; do { count++; temp = temp->next; } while (temp!=ptr); } return count; } List length ©RohitBirlaDataStructure RevisionTutorial 122 15-Oct-2011
  • 123. Doubly Linked List • Keep a pointer to the next and the previous element in the list typedef struct node *pnode; typedef struct node { char data [4]; pnode next; pnode prev; } ©RohitBirlaDataStructure RevisionTutorial 123 15-Oct-2011
  • 124. Doubly Linked List • Keep a header and trailer pointers (sentinels) with no content • header.prev = null; header.next = first element • trailer.next = null; trailer.prev = last element • Update pointers for every operation performed on the list • How to remove an element from the tail of the list ? ©RohitBirlaDataStructure RevisionTutorial 124 15-Oct-2011
  • 125. Doubly Linked List – removeLast() • Running time? • How does this compare to simply linked lists? ©RohitBirlaDataStructure RevisionTutorial 125 15-Oct-2011
  • 126. Doubly Linked List • insertFirst • swapElements ©RohitBirlaDataStructure RevisionTutorial 126 15-Oct-2011
  • 127.               15000 0040 00012 01100 Previous scheme: represent each non-NULL element as a tuple (row, column, value) New scheme: each column (row): a circular linked list with a head node Revisit Sparse Matrices ©RohitBirlaDataStructure RevisionTutorial 127 15-Oct-2011
  • 128. down right value row col aij i j entry node aij Nodes in the Sparse Matrix ©RohitBirlaDataStructure RevisionTutorial 128 15-Oct-2011
  • 129. 4 4 1 0 12 2 1 -4 0 2 11 3 3 -15 1 1 5 Circular linked list Linked Representation ©RohitBirlaDataStructure RevisionTutorial 129 15-Oct-2011
  • 130. #defineMAX_SIZE50/*sizeoflargestmatrix*/ typedefstructmnode*pmnode; typedefstructmnode{ introw; intcol; intvalue; pmnodenext,down; }; Operationsonsparsematrices Sparse Matrix Implementation ©RohitBirlaDataStructure RevisionTutorial 130 15-Oct-2011
  • 131. Queue • Stores a set of elements in a particular order • Stack principle: FIRST IN FIRST OUT • = FIFO • It means: the first element inserted is the first one to be removed • Example • The first one in line is the first one to be served ©RohitBirlaDataStructure RevisionTutorial 131 15-Oct-2011
  • 132. Queue Applications • Real life examples • Waiting in line • Waiting on hold for tech support • Applications related to Computer Science • Threads • Job scheduling (e.g. Round-Robin algorithm for CPU allocation) ©RohitBirlaDataStructure RevisionTutorial 132 15-Oct-2011
  • 133. A B A C B A D C B A D C Brear front rear front rear front rear front rear front First In First Out ©RohitBirlaDataStructure RevisionTutorial 133 15-Oct-2011
  • 134. front rear Q[0] Q[1] Q[2] Q[3] Comments -1 -1 -1 -1 0 1 -1 0 1 2 2 2 J1 J1 J2 J1 J2 J3 J2 J3 J3 queue is empty Job 1 is added Job 2 is added Job 3 is added Job 1 is deleted Job 2 is deleted Applications: Job Scheduling ©RohitBirlaDataStructure RevisionTutorial 134 15-Oct-2011
  • 135. objects: afiniteorderedlistwithzeroormoreelements. methods: forallqueueQueue,itemelement, max_queue_sizepositiveinteger QueuecreateQ(max_queue_size) ::= createanemptyqueuewhosemaximumsizeis max_queue_size BooleanisFullQ(queue, max_queue_size)::= if(numberofelementsinqueue==max_queue_size) returnTRUE elsereturnFALSE QueueEnqueue(queue, item)::= if(IsFullQ(queue)) queue_full elseinsertitematrearofqueueandreturnqueue Queue ADT ©RohitBirlaDataStructure RevisionTutorial 135 15-Oct-2011
  • 136. BooleanisEmptyQ(queue) ::= if(queue==CreateQ(max_queue_size)) returnTRUE elsereturnFALSE Elementdequeue(queue) ::= if(IsEmptyQ(queue)) return elseremoveandreturntheitematfrontofqueue. Queue ADT (cont’d) ©RohitBirlaDataStructure RevisionTutorial 136 15-Oct-2011
  • 137. Array-based Queue Implementation • As with the array-based stack implementation, the array is of fixed size • A queue of maximum N elements • Slightly more complicated • Need to maintain track of both front and rear Implementation 1 Implementation 2 ©RohitBirlaDataStructure RevisionTutorial 137 15-Oct-2011
  • 138. QueuecreateQ(max_queue_size) ::= #defineMAX_QUEUE_SIZE100/*Maximumqueuesize*/ typedefstruct{ intkey; /*otherfields*/ }element; elementqueue[MAX_QUEUE_SIZE]; intrear=-1; intfront=-1; BooleanisEmpty(queue) ::=front==rear BooleanisFullQ(queue) ::=rear==MAX_QUEUE_SIZE-1 Implementation 1: createQ, isEmptyQ, isFullQ ©RohitBirlaDataStructure RevisionTutorial 138 15-Oct-2011
  • 139. voidenqueue(int *rear,elementitem) { /*addanitemtothequeue*/ if(*rear==MAX_QUEUE_SIZE_1) { queue_full( ); return; } queue[++*rear]=item; } Implementation 1: enqueue ©RohitBirlaDataStructure RevisionTutorial 139 15-Oct-2011
  • 140. elementdequeue(int *front,intrear) { /*removeelementatthefrontofthequeue*/ if(*front==rear) returnqueue_empty(); /*returnanerrorkey*/ returnqueue[++*front]; } Implementation 1: dequeue ©RohitBirlaDataStructure RevisionTutorial 140 15-Oct-2011
  • 141. EMPTY QUEUE [2] [3] [2] [3] [1] [4] [1] [4] [0] [5] [0] [5] front = 0 front = 0 rear = 0 rear = 3 J2 J1 J3 Implementation 2: Wrapped Configuration Can be seen as a circular queue ©RohitBirlaDataStructure RevisionTutorial 141 15-Oct-2011
  • 142. FULL QUEUE FULL QUEUE [2] [3] [2] [3] [1] [4][1] [4] [0] [5] [0] [5] front =0 rear = 5 front =4 rear =3 J2 J3 J1 J4 J5 J6 J5 J7 J8 J9 Leave one empty space when queue is full Why? How to test when queue is empty? How to test when queue is full? ©RohitBirlaDataStructure RevisionTutorial 142 15-Oct-2011
  • 143. voidenqueue(int front,int*rear,elementitem) { /*addanitemtothequeue*/ *rear=(*rear+1)%MAX_QUEUE_SIZE; if(front==*rear)/*resetrearandprinterror*/ return; } queue[*rear]=item; } Enqueue in a Circular Queue ©RohitBirlaDataStructure RevisionTutorial 143 15-Oct-2011
  • 144. elementdequeue(int* front,intrear) { elementitem; /*removefrontelementfromthequeueandputitinitem*/ if(*front==rear) returnqueue_empty(); /*queue_emptyreturnsanerrorkey*/ *front=(*front+1)%MAX_QUEUE_SIZE; returnqueue[*front]; } Dequeue from Circular Queue ©RohitBirlaDataStructure RevisionTutorial 144 15-Oct-2011
  • 145. void enqueue(pnode front, pnode rear, element item) { /* add an element to the rear of the queue */ pnode temp = (pnode) malloc(sizeof (queue)); if (IS_FULL(temp)) { fprintf(stderr, “ The memory is fulln”); exit(1); } temp->item = item; temp->next= NULL; if (front) { (rear) -> next= temp;} else front = temp; rear = temp; } List-based Queue Implementation: Enqueue ©RohitBirlaDataStructure RevisionTutorial 145 15-Oct-2011
  • 146. element dequeue(pnode front) { /* delete an element from the queue */ pnode temp = front; element item; if (IS_EMPTY(front)) { fprintf(stderr, “The queue is emptyn”); exit(1); } item = temp->item; front = temp->next; free(temp); return item; } Dequeue ©RohitBirlaDataStructure RevisionTutorial 146 15-Oct-2011
  • 147. Algorithm Analysis • enqueue O(?) • dequeue O(?) • size O(?) • isEmpty O(?) • isFull O(?) • What if I want the first element to be always at Q[0] ? ©RohitBirlaDataStructure RevisionTutorial 147 15-Oct-2011
  • 148. Stacks • Stack: what is it? • ADT • Applications • Implementation(s) ©RohitBirlaDataStructure RevisionTutorial 148 15-Oct-2011
  • 149. What is a stack? • Stores a set of elements in a particular order • Stack principle: LAST IN FIRST OUT • = LIFO • It means: the last element inserted is the first one to be removed • Example • Which is the first element to pick up? ©RohitBirlaDataStructure RevisionTutorial 149 15-Oct-2011
  • 150. Last In First Out B A D C B A C B A D C B A E D C B A top top top top top A ©RohitBirlaDataStructure RevisionTutorial 150 15-Oct-2011
  • 151. Stack Applications • Real life • Pile of books • Plate trays • More applications related to computer science • Program execution stack (read more from your text) • Evaluating expressions ©RohitBirlaDataStructure RevisionTutorial 151 15-Oct-2011
  • 152. objects: afiniteorderedlistwithzeroormoreelements. methods: forallstackStack,itemelement,max_stack_size positiveinteger StackcreateS(max_stack_size) ::= createanemptystackwhosemaximumsizeis max_stack_size BooleanisFull(stack,max_stack_size) ::= if(numberofelementsinstack==max_stack_size) returnTRUE elsereturnFALSE Stackpush(stack, item)::= if(IsFull(stack)) stack_full elseinsertiteminto topofstackandreturn Stack ADT ©RohitBirlaDataStructure RevisionTutorial 152 15-Oct-2011
  • 153. BooleanisEmpty(stack) ::= if(stack==CreateS(max_stack_size)) returnTRUE elsereturnFALSE Elementpop(stack)::= if(IsEmpty(stack)) return elseremoveandreturntheitemonthetop ofthestack. Stack ADT (cont’d) ©RohitBirlaDataStructure RevisionTutorial 153 15-Oct-2011
  • 154. Array-based Stack Implementation • Allocate an array of some size (pre-defined) • Maximum N elements in stack • Bottom stack element stored at element 0 • last index in the array is the top • Increment top when one element is pushed, decrement after pop ©RohitBirlaDataStructure RevisionTutorial 154 15-Oct-2011
  • 155. StackcreateS(max_stack_size) ::= #defineMAX_STACK_SIZE100/*maximumstacksize*/ typedefstruct{ intkey; /*otherfields*/ }element; elementstack[MAX_STACK_SIZE]; inttop=-1; BooleanisEmpty(Stack) ::=top<0; BooleanisFull(Stack) ::=top>=MAX_STACK_SIZE-1; Stack Implementation: CreateS, isEmpty, isFull ©RohitBirlaDataStructure RevisionTutorial 155 15-Oct-2011
  • 156. voidpush(int *top,elementitem) { /*addanitemtotheglobalstack*/ if(*top>=MAX_STACK_SIZE-1) { stack_full( ); return; } stack[++*top] =item; } Push ©RohitBirlaDataStructure RevisionTutorial 156 15-Oct-2011
  • 157. elementpop(int*top) { /*returnthetopelementfromthestack*/ if(*top==-1) returnstack_empty(); /*returnsanderrorkey*/ returnstack[(*top)--]; } Pop ©RohitBirlaDataStructure RevisionTutorial 157 15-Oct-2011
  • 158. voidpush(pnode top,elementitem) { /*addanelementtothetopofthestack*/ pnodetemp= (pnode)malloc(sizeof(node)); if(IS_FULL(temp)){ fprintf(stderr, “Thememoryisfulln”); exit(1); } temp->item=item; temp->next= top; top=temp; } List-based Stack Implementation: Push ©RohitBirlaDataStructure RevisionTutorial 158 15-Oct-2011
  • 159. elementpop(pnodetop){ /*deleteanelementfromthestack*/ pnodetemp=top; elementitem; if(IS_EMPTY(temp)) { fprintf(stderr, “Thestackisemptyn”); exit(1); } item=temp->item; top=temp->next; free(temp); returnitem; } Pop ©RohitBirlaDataStructure RevisionTutorial 159 15-Oct-2011
  • 160. Algorithm Analysis • pushO(?) • pop O(?) • isEmpty O(?) • isFull O(?) • What if top is stored at the beginning of the array? ©RohitBirlaDataStructure RevisionTutorial 160 15-Oct-2011
  • 161. A Legend The Towers of Hanoi • In the great temple of Brahma in Benares, on a brass plate under the dome that marks the center of the world, there are 64 disks of pure gold that the priests carry one at a time between these diamond needles according to Brahma's immutable law: No disk may be placed on a smaller disk. In the begging of the world all 64 disks formed the Tower of Brahma on one needle. Now, however, the process of transfer of the tower from one needle to another is in mid course. When the last disk is finally in place, once again forming the Tower of Brahma but on a different needle, then will come the end of the world and all will turn to dust. ©RohitBirlaDataStructure RevisionTutorial 161 15-Oct-2011
  • 162. The Towers of Hanoi A Stack-based Application • GIVEN: three poles • a set of discs on the first pole, discs of different sizes, the smallest discs at the top • GOAL: move all the discs from the left pole to the right one. • CONDITIONS: only one disc may be moved at a time. • A disc can be placed either on an empty pole or on top of a larger disc. ©RohitBirlaDataStructure RevisionTutorial 162 15-Oct-2011
  • 163. Towers of Hanoi ©RohitBirlaDataStructure RevisionTutorial 163 15-Oct-2011
  • 164. Towers of Hanoi ©RohitBirlaDataStructure RevisionTutorial 164 15-Oct-2011
  • 165. Towers of Hanoi ©RohitBirlaDataStructure RevisionTutorial 165 15-Oct-2011
  • 166. Towers of Hanoi ©RohitBirlaDataStructure RevisionTutorial 166 15-Oct-2011
  • 167. Towers of Hanoi ©RohitBirlaDataStructure RevisionTutorial 167 15-Oct-2011
  • 168. Towers of Hanoi ©RohitBirlaDataStructure RevisionTutorial 168 15-Oct-2011
  • 169. Towers of Hanoi ©RohitBirlaDataStructure RevisionTutorial 169 15-Oct-2011
  • 170. Towers of Hanoi ©RohitBirlaDataStructure RevisionTutorial 170 15-Oct-2011
  • 171. Towers of Hanoi – Recursive Solution void hanoi (int discs, Stack fromPole, Stack toPole, Stack aux) { Disc d; if( discs >= 1) { hanoi(discs-1, fromPole, aux, toPole); d = fromPole.pop(); toPole.push(d); hanoi(discs-1,aux, toPole, fromPole); } ©RohitBirlaDataStructure RevisionTutorial 171 15-Oct-2011
  • 172. Is the End of the World Approaching? • Problem complexity 2n • 64 gold discs • Given 1 move a second  600,000,000,000 years until the end of the world  ©RohitBirlaDataStructure RevisionTutorial 172 15-Oct-2011
  • 173. Applications • Infix to Postfix conversion [Evaluation of Expressions] ©RohitBirlaDataStructure RevisionTutorial 173 15-Oct-2011
  • 174. X=a/b-c+d*e-a*c a=4,b=c=2,d=e=3 Interpretation 1: ((4/2)-2)+(3*3)-(4*2)=0 +8+9=1 Interpretation 2: (4/(2-2+3))*(3-4)*2=(4/3)*(-1)*2=-2.66666… Howtogenerate themachine instructions corresponding toa given expression? precedence rule +associative rule Evaluation of Expressions ©RohitBirlaDataStructure RevisionTutorial 174 15-Oct-2011
  • 175. Token Operator Precedence1 Associativity ( ) [ ] -> . function call array element struct or union member 17 left-to-right -- ++ increment, decrement2 16 left-to-right -- ++ ! - - + & * sizeof decrement, increment3 logical not one’s complement unary minus or plus address or indirection size (in bytes) 15 right-to-left (type) type cast 14 right-to-left * / % mutiplicative 13 Left-to-right ©RohitBirlaDataStructure RevisionTutorial 175 15-Oct-2011
  • 176. + - binary add or subtract 12 left-to-right << >> shift 11 left-to-right > >= < <= relational 10 left-to-right == != equality 9 left-to-right & bitwise and 8 left-to-right ^ bitwise exclusive or 7 left-to-right bitwise or 6 left-to-right && logical and 5 left-to-right  logical or 4 left-to-right ©RohitBirlaDataStructure RevisionTutorial 176 15-Oct-2011
  • 177. ?: conditional 3 right-to-left = += -= /= *= %= <<= >>= &= ^= = assignment 2 right-to-left , comma 1 left-to-right ©RohitBirlaDataStructure RevisionTutorial 177 15-Oct-2011
  • 178. Infix Postfix 2+3*4 a*b+5 (1+2)*7 a*b/c (a/(b-c+d))*(e-a)*c a/b-c+d*e-a*c 234*+ ab*5+ 12+7* ab*c/ abc-d+/ea-*c* ab/c-de*ac*- user compiler Postfix: no parentheses, no precedence ©RohitBirlaDataStructure RevisionTutorial 178 15-Oct-2011
  • 179. Token Stack [0] [1] [2] Top 6 2 / 3 - 4 2 * + 6 6 2 6/2 6/2 3 6/2-3 6/2-3 4 6/2-3 4 2 6/2-3 4*2 6/2-3+4*2 0 1 0 1 0 1 2 1 0 ©RohitBirlaDataStructure RevisionTutorial 179 15-Oct-2011
  • 180. #defineMAX_STACK_SIZE100/*maximumstacksize*/ #defineMAX_EXPR_SIZE 100/*maxsizeofexpression*/ typedefenum{1paran,rparen,plus,minus,times,divide, mod,eos,operand}precedence; intstack[MAX_STACK_SIZE];/*globalstack*/ charexpr[MAX_EXPR_SIZE]; /*inputstring*/ Assumptions: operators: +, -, *, /, % operands: single digit integer Infix to Postfix ©RohitBirlaDataStructure RevisionTutorial 180 15-Oct-2011
  • 181. inteval(void) { /*evaluate apostfixexpression,expr,maintainedasa globalvariable,‘0’isthetheendoftheexpression. Thestackandtopofthestackareglobalvariables. get_tokenisusedtoreturnthetokentypeand thecharactersymbol.Operandsareassumedtobesingle characterdigits*/ precedencetoken; charsymbol; intop1,op2; intn=0; /*counterfortheexpressionstring*/ inttop=-1; token=get_token(&symbol,&n); while(token!=eos) { if(token==operand) push(&top, symbol-’0’); /*stackinsert*/ Evaluation of Postfix Expressions ©RohitBirlaDataStructure RevisionTutorial 181 15-Oct-2011
  • 182. else{/*removetwooperands,performoperation,and returnresulttothestack*/ op2=pop(&top); /*stackdelete*/ op1=pop(&top); switch(token){ caseplus:push(&top,op1+op2);break; caseminus:push(&top, op1-op2);break; casetimes:push(&top, op1*op2);break; casedivide:push(&top,op1/op2);break; casemod:push(&top, op1%op2); } } token=get_token(&symbol,&n); } returnpop(&top);/*returnresult*/ } ©RohitBirlaDataStructure RevisionTutorial 182 15-Oct-2011
  • 183. precedenceget_token(char *symbol,int*n) { /*getthenexttoken,symbolisthecharacter representation,whichisreturned,thetokenis represented byitsenumeratedvalue,which isreturnedinthefunctionname*/ *symbol=expr[(*n)++]; switch(*symbol) { case‘(‘:returnlparen; case’)’:returnrparen; case‘+’:returnplus; case‘-’:returnminus; ©RohitBirlaDataStructure RevisionTutorial 183 15-Oct-2011
  • 184. case‘/’: returndivide; case‘*’:returntimes; case‘%’:returnmod; case‘0‘:returneos; default :returnoperand; /*noerrorchecking,defaultisoperand*/ } } ©RohitBirlaDataStructure RevisionTutorial 184 15-Oct-2011
  • 185. Infix to Postfix Conversion (Intuitive Algorithm) (1) Fully parenthesized expression a / b - c + d * e - a * c --> ((((a / b) - c) + (d * e)) – (a * c)) (2) All operators replace their corresponding right parentheses. ((((a / b) - c) + (d * e)) – (a * c)) (3) Delete all parentheses. ab/c-de*+ac*- two passes / - *+ *- ©RohitBirlaDataStructure RevisionTutorial 185 15-Oct-2011
  • 186. Token Stack [0] [1] [2] Top Output a + b * c eos + + + * + * -1 0 0 1 1 -1 a a ab ab abc abc*= The orders of operands in infix and postfix are the same. a + b * c, * > + ©RohitBirlaDataStructure RevisionTutorial 186 15-Oct-2011
  • 187. Token Stack [0] [1] [2] Top Output a *1 ( b + c ) *2 d eos *1 *1 ( *1 ( *1 ( + *1 ( + *1 *2 *2 *2 -1 0 1 1 2 2 0 0 0 0 a a a ab ab abc abc+ abc+*1 abc+*1d abc+*1d*2 a *1 (b +c) *2 d match ) *1 = *2 ©RohitBirlaDataStructure RevisionTutorial 187 15-Oct-2011
  • 188. (1) Operators are taken out of the stack as long as their in-stack precedence is higher than or equal to the incoming precedence of the new operator. (2) ( has low in-stack precedence, and high incoming precedence. ( ) + - * / % eos isp 0 19 12 12 13 13 13 0 icp 20 19 12 12 13 13 13 0 Rules ©RohitBirlaDataStructure RevisionTutorial 188 15-Oct-2011
  • 189. precedencestack[MAX_STACK_SIZE]; /*ispandicparrays--indexisvalueofprecedence lparen,rparen,plus,minus,times,divide,mod,eos*/ staticintisp[]={0,19,12,12,13,13,13,0}; staticinticp[]={20,19,12,12,13,13,13,0}; isp: in-stack precedence icp: incoming precedence ©RohitBirlaDataStructure RevisionTutorial 189 15-Oct-2011
  • 190. voidpostfix(void) { /*outputthepostfixoftheexpression.Theexpression string,thestack,andtopareglobal*/ charsymbol; precedencetoken; intn=0; inttop=0;/*placeeosonstack*/ stack[0] =eos; for(token=get_token(&symbol,&n);token!=eos; token=get_token(&symbol,&n)){ if(token==operand) printf(“%c”,symbol); elseif(token==rparen){ Infix to Postfix ©RohitBirlaDataStructure RevisionTutorial 190 15-Oct-2011
  • 191. /*unstack tokensuntilleftparenthesis */ while(stack[top] !=lparen) print_token(delete(&top)); pop(&top);/*discardtheleftparenthesis */ } else{ /*removeandprintsymbolswhoseispisgreater thanorequaltothecurrenttoken’sicp*/ while(isp[stack[top]] >=icp[token] ) print_token(delete(&top)); push(&top, token); } } while((token =pop(&top))!=eos) print_token(token); print(“n”); } Infix to Postfix (cont’d) ©RohitBirlaDataStructure RevisionTutorial 191 15-Oct-2011
  • 192. The British Constitution Crown Church of England Cabine t House of Commons House of Lords Suprem e Court Minister s County Council Metropolita n police County Borough Council Rural District Council 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 192
  • 193. More Trees Examples • Unix / Windows file structure 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 193
  • 194. Definition of Tree A tree is a finite set of one or more nodes such that: There is a specially designated node called the root. The remaining nodes are partitioned into n>=0 disjoint sets T1, ..., Tn, where each of these sets is a tree. We call T1, ..., Tn the subtrees of the root. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 194
  • 195. Level and Depth K L E F B G C M H I J D A Level 1 2 3 4 node (13) degree of a node leaf (terminal) nonterminal parent children sibling degree of a tree (3) ancestor level of a node height of a tree (4) 3 2 1 3 2 0 0 1 0 0 0 0 0 1 2 2 2 3 3 3 3 3 3 4 4 4 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 195
  • 196. Terminology The degree of a node is the number of subtrees of the node The degree of A is 3; the degree of C is 1. The node with degree 0 is a leaf or terminal node. A node that has subtrees is the parent of the roots of the subtrees. The roots of these subtrees are the children of the node. Children of the same parent are siblings. The ancestors of a node are all the nodes along the path from the root to the node. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 196
  • 197. Tree Properties A B C D G E F IH Property Value Number of nodes Height Root Node Leaves Interior nodes Number of levels Ancestors of H Descendants of B Siblings of E Right subtree 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 197
  • 198. Representation of Trees List Representation ( A ( B ( E ( K, L ), F ), C ( G ), D ( H ( M ), I, J ) ) ) The root comes first, followed by a list of sub-trees data link 1 link 2 ... link n How many link fields are needed in such a representation? 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 198
  • 199. A Tree Node • Every tree node: • object – useful information • children – pointers to its children nodes O O O O O 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 199
  • 200. Left Child - Right Sibling A B C D E F G H I J K L M data left child right sibling 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 200
  • 201. Tree ADT • Objects: any type of objects can be stored in a tree • Methods: • accessor methods • root() – return the root of the tree • parent(p) – return the parent of a node • children(p) – returns the children of a node • query methods • size() – returns the number of nodes in the tree • isEmpty() - returns true if the tree is empty • elements() – returns all elements • isRoot(p), isInternal(p), isExternal(p) 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 201
  • 202. Tree Implementation typedef struct tnode { int key; struct tnode* lchild; struct tnode* sibling; } *ptnode; - Create a tree with three nodes (one root & two children) - Insert a new node (in tree with root R, as a new child at level L) - Delete a node (in tree with root R, the first child at level L) 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 202
  • 203. Tree Traversal • Two main methods: • Preorder • Postorder • Recursive definition • PREorder: • visit the root • traverse in preorder the children (subtrees) • POSTorder • traverse in postorder the children (subtrees) • visit the root 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 203
  • 204. Preorder • preorder traversal Algorithm preOrder(v) “visit” node v for each child w of v do recursively perform preOrder(w) 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 204
  • 205. Postorder • postorder traversal Algorithm postOrder(v) for each child w of v do recursively perform postOrder(w) “visit” node v • du (disk usage) command in Unix 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 205
  • 206. Preorder Implementation public void preorder(ptnode t) { ptnode ptr; display(t->key); for(ptr = t->lchild; NULL != ptr; ptr = ptr->sibling) { preorder(ptr); } } 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 206
  • 207. Postorder Implementation public void postorder(ptnode t) { ptnode ptr; for(ptr = t->lchild; NULL != ptr; ptr = ptr->sibling) { postorder(ptr); } display(t->key); } 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 207
  • 208. Binary Trees A special class of trees: max degree for each node is 2 Recursive definition: A binary tree is a finite set of nodes that is either empty or consists of a root and two disjoint binary trees called the left subtree and the right subtree. Any tree can be transformed into binary tree. by left child-right sibling representation 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 208
  • 209. Example J IM H L A B C D E F GK 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 209
  • 210. ADT Binary Tree objects: a finite set of nodes either empty or consisting of a root node, left BinaryTree, and right BinaryTree. method: for all bt, bt1, bt2  BinTree, item  element Bintree create()::= creates an empty binary tree Boolean isEmpty(bt)::= if (bt==empty binary tree) return TRUE else return FALSE 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 210
  • 211. BinTree makeBT(bt1, item, bt2)::= return a binary tree whose left subtree is bt1, whose right subtree is bt2, and whose root node contains the data item Bintree leftChild(bt)::= if (IsEmpty(bt)) return error else return the left subtree of bt element data(bt)::= if (IsEmpty(bt)) return error else return the data in the root node of bt Bintree rightChild(bt)::= if (IsEmpty(bt)) return error else return the right subtree of bt 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 211
  • 212. Samples of Trees A B A B A B C GE I D H F Complete Binary Tree Skewed Binary Tree E C D 1 2 3 4 5 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 212
  • 213. Maximum Number of Nodes in BT The maximum number of nodes on level i of a binary tree is 2i-1, i>=1. The maximum nubmer of nodes in a binary tree of depth k is 2k-1, k>=1. Prove by induction. 2 2 11 1 i i k k =  =  15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 213
  • 214. Full BT vs. Complete BT A full binary tree of depth k is a binary tree of depth k having 2 -1 nodes, k>=0. A binary tree with n nodes and depth k is complete iff its nodes correspond to the nodes numbered from 1 to n in the full binary tree of depth k. k A B C GE I D H F A B C GE K D J F IH ONML Full binary tree of depth 4Complete binary tree 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 215
  • 215. Binary Tree Representations If a complete binary tree with n nodes (depth = log n + 1) is represented sequentially, then for any node with index i, 1<=i<=n, we have: parent(i) is at i/2 if i!=1. If i=1, i is at the root and has no parent. leftChild(i) is at 2i if 2i<=n. If 2i>n, then i has no left child. rightChild(i) is at 2i+1 if 2i +1 <=n. If 2i +1 >n, then i has no right child. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 216
  • 216. Sequential Representation A B -- C -- -- -- D -- . E [1] [2] [3] [4] [5] [6] [7] [8] [9] . [16] [1] [2] [3] [4] [5] [6] [7] [8] [9] A B C D E F G H I A B E C D A B C GE I D H F (1) waste space (2) insertion/deletion problem 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 217
  • 217. Space Overhead (1) From the Full Binary Tree Theorem: • Half of the pointers are null. If leaves store only data, then overhead depends on whether the tree is full. Ex: All nodes the same, with two pointers to children: • Total space required is (2p + d)n • Overhead: 2pn • If p = d, this means 2p/(2p + d) = 2/3 overhead. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 218
  • 218. Space Overhead (2) Eliminate pointers from the leaf nodes: n/2(2p) p n/2(2p) + dn p + d This is 1/2 if p = d. 2p/(2p + d) if data only at leaves  2/3 overhead. Note that some method is needed to distinguish leaves from internal nodes. = 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 219
  • 219. Array Implementation (1) Position 0 1 2 3 4 5 6 7 8 9 10 11 Parent -- 0 0 1 1 2 2 3 3 4 4 5 Left Child 1 3 5 7 9 11 -- -- -- -- -- -- Right Child 2 4 6 8 10 -- -- -- -- -- -- -- Left Sibling -- -- 1 -- 3 -- 5 -- 7 -- 9 -- Right Sibling -- 2 -- 4 -- 6 -- 8 -- 10 -- -- 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 220
  • 220. Array Implementation (1) Parent (r) = Leftchild(r) = Rightchild(r) = Leftsibling(r) = Rightsibling(r) = 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 221
  • 221. Linked Representation typedef struct tnode *ptnode; typedef struct tnode { int data; ptnode left, right; }; dataleft right data left right 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 222
  • 222. Binary Tree Traversals Let L, V, and R stand for moving left, visiting the node, and moving right. There are six possible combinations of traversal lRr, lrR, Rlr, Rrl, rRl, rlR Adopt convention that we traverse left before right, only 3 traversals remain lRr, lrR, Rlr inorder, postorder, preorder 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 223
  • 223. Arithmetic Expression Using BT + * A * / E D C B inorder traversal A / B * C * D + E infix expression preorder traversal + * * / A B C D E prefix expression postorder traversal A B / C * D * E + postfix expression level order traversal + * E * D / C A B 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 224
  • 224. Inorder Traversal (recursive version) void inorder(ptnode ptr) /* inorder tree traversal */ { if (ptr) { inorder(ptr->left); printf(“%d”, ptr->data); indorder(ptr->right); } } A / B * C * D + E 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 225
  • 225. Preorder Traversal(recursive version) void preorder(ptnode ptr) /* preorder tree traversal */ { if (ptr) { printf(“%d”, ptr->data); preorder(ptr->left); predorder(ptr->right); } } + * * / A B C D E 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 226
  • 226. Postorder Traversal(recursive version) void postorder(ptnode ptr) /* postorder tree traversal */ { if (ptr) { postorder(ptr->left); postdorder(ptr->right); printf(“%d”, ptr->data); } } A B / C * D * E + 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 227
  • 227. Level Order Traversal (using queue) void levelOrder(ptnode ptr) /* level order tree traversal */ { int front = rear = 0; ptnode queue[MAX_QUEUE_SIZE]; if (!ptr) return; /* empty queue */ enqueue(front, &rear, ptr); for (;;) { ptr = dequeue(&front, rear); 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 228
  • 228. if (ptr) { printf(“%d”, ptr->data); if (ptr->left) enqueue(front, &rear, ptr->left); if (ptr->right) enqueue(front, &rear, ptr->right); } else break; } } + * E * D / C A B 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 229
  • 229. Euler Tour Traversal • generic traversal of a binary tree • the preorder, inorder, and postorder traversals are special cases of the Euler tour traversal • “walk around” the tree and visit each node three times: • on the left • from below • on the right 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 230
  • 230. Euler Tour Traversal (cont’d) eulerTour(node v) { perform action for visiting node on the left; if v is internal then eulerTour(v->left); perform action for visiting node from below; if v is internal then eulerTour(v->right); perform action for visiting node on the right; } 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 231
  • 231. Euler Tour Traversal (cont’d) • preorder traversal = Euler Tour with a “visit” only on the left • inorder = ? • postorder = ? • Other applications: compute number of descendants for each node v: • counter = 0 • increment counter each time node is visited on the left • #descendants = counter when node is visited on the right – counter when node is visited on the left + 1 • Running time for Euler Tour? 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 232
  • 232. Application: Evaluation of Expressions + * A * / E D C B inorder traversal A / B * C * D + E infix expression preorder traversal + * * / A B C D E prefix expression postorder traversal A B / C * D * E + postfix expression level order traversal + * E * D / C A B 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 233
  • 233. Inorder Traversal (recursive version) void inorder(ptnode ptr) /* inorder tree traversal */ { if (ptr) { inorder(ptr->left); printf(“%d”, ptr->data); inorder(ptr->right); } } A / B * C * D + E 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 234
  • 234. Preorder Traversal(recursive version) void preorder(ptnode ptr) /* preorder tree traversal */ { if (ptr) { printf(“%d”, ptr->data); preorder(ptr->left); preorder(ptr->right); } } + * * / A B C D E 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 235
  • 235. Postorder Traversal(recursive version) void postorder(ptnode ptr) /* postorder tree traversal */ { if (ptr) { postorder(ptr->left); postorder(ptr->right); printf(“%d”, ptr->data); } } A B / C * D * E + 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 236
  • 236. Application: Propositional Calculus Expression • A variable is an expression. • If x and y are expressions, then ¬x, xy, xy are expressions. • Parentheses can be used to alter the normal order of evaluation (¬ >  > ). • Example: x1  (x2  ¬x3) 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 237
  • 237. Propositional Calculus Expression    X3X1 X2 X1  X3 (x1  ¬x2)  (¬ x1  x3)  ¬x3 postorder traversal (postfix evaluation) 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 238
  • 238. Node Structure left data value right typedef emun {not, and, or, true, false } logical; typedef struct tnode *ptnode; typedef struct node { logical data; short int value; ptnode right, left; } ; 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 239
  • 239. Postorder Eval void post_order_eval(ptnode node) { /* modified post order traversal to evaluate a propositional calculus tree */ if (node) { post_order_eval(node->left); post_order_eval(node->right); switch(node->data) { case not: node->value = !node->right->value; break; 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 240
  • 240. Postorder Eval (cont’d) case and: node->value = node->right->value && node->left->value; break; case or: node->value = node->right->value | | node->left->value; break; case true: node->value = TRUE; break; case false: node->value = FALSE; } } } 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 241
  • 241. A Taxonomy of Trees • General Trees – any number of children / node • Binary Trees – max 2 children / node • Heaps – parent < (>) children • Binary Search Trees ©RohitBirlaDataStructure RevisionTutorial 242 15-Oct-2011
  • 242. Binary Trees • Binary search tree • Every element has a unique key. • The keys in a nonempty left subtree (right subtree) are smaller (larger) than the key in the root of subtree. • The left and right subtrees are also binary search trees. ©RohitBirlaDataStructure RevisionTutorial 243 15-Oct-2011
  • 243. Binary Search Trees • Binary Search Trees (BST) are a type of Binary Trees with a special organization of data. • This data organization leads to O(log n) complexity for searches, insertions and deletions in certain types of the BST (balanced trees). • O(h) in general ©RohitBirlaDataStructure RevisionTutorial 244 15-Oct-2011
  • 244. 34 41 56 63 72 89 95 0 1 2 3 4 5 6 34 41 56 0 1 2 72 89 95 4 5 6 34 56 0 2 72 95 4 6 Binary Search algorithm of an array of sorted items reduces the search space by one half after each comparison Binary Search Algorithm ©RohitBirlaDataStructure RevisionTutorial 245 15-Oct-2011
  • 245. 63 41 89 34 56 72 95 • the values in all nodes in the left subtree of a node are less than the node value • the values in all nodes in the right subtree of a node are greater than the node values Organization Rule for BST ©RohitBirlaDataStructure RevisionTutorial 246 15-Oct-2011
  • 246. Binary Tree typedef struct tnode *ptnode; typedef struct node { short int key; ptnode right, left; } ; ©RohitBirlaDataStructure RevisionTutorial 247 15-Oct-2011
  • 247. Searching in the BST method search(key) • implements the binary search based on comparison of the items in the tree • the items in the BST must be comparable (e.g integers, string, etc.) The search starts at the root. It probes down, comparing the values in each node with the target, till it finds the first item equal to the target. Returns this item or null if there is none. BST Operations: Search ©RohitBirlaDataStructure RevisionTutorial 248 15-Oct-2011
  • 248. if the tree is empty return NULL else if the item in the node equals the target return the node value else if the item in the node is greater than the target return the result of searching the left subtree else if the item in the node is smaller than the target return the result of searching the right subtree Search in BST - Pseudocode ©RohitBirlaDataStructure RevisionTutorial 249 15-Oct-2011
  • 249. Search in a BST: C code Ptnode search(ptnode root, int key) { /* return a pointer to the node that contains key. If there is no such node, return NULL */ if (!root) return NULL; if (key == root->key) return root; if (key < root->key) return search(root->left,key); return search(root->right,key); } ©RohitBirlaDataStructure RevisionTutorial 250 15-Oct-2011
  • 250. method insert(key)  places a new item near the frontier of the BST while retaining its organization of data: starting at the root it probes down the tree till it finds a node whose left or right pointer is empty and is a logical place for the new value uses a binary search to locate the insertion point is based on comparisons of the new item and values of nodes in the BST Elements in nodes must be comparable! BST Operations: Insertion ©RohitBirlaDataStructure RevisionTutorial 251 15-Oct-2011
  • 251. 9 7 5 4 6 8 Case 1: The Tree is Empty Set the root to a new node containing the item Case 2: The Tree is Not Empty Call a recursive helper method to insert the item 10 10 > 7 10 > 9 10 ©RohitBirlaDataStructure RevisionTutorial 252 15-Oct-2011
  • 252. if tree is empty create a root node with the new key else compare key with the top node if key = node key replace the node with the new value else if key > node key compare key with the right subtree: if subtree is empty create a leaf node else add key in right subtree else key < node key compare key with the left subtree: if the subtree is empty create a leaf node else add key to the left subtree Insertion in BST - Pseudocode ©RohitBirlaDataStructure RevisionTutorial 253 15-Oct-2011
  • 253. Insertion into a BST: C code void insert (ptnode *node, int key) { ptnode ptr, temp = search(*node, key); if (temp || !(*node)) { ptr = (ptnode) malloc(sizeof(tnode)); if (IS_FULL(ptr)) { fprintf(stderr, “The memory is fulln”); exit(1); } ptr->key = key; ptr->left = ptr->right = NULL; if (*node) if (key<temp->key) temp->left=ptr; else temp->right = ptr; else *node = ptr; } } ©RohitBirlaDataStructure RevisionTutorial 254 15-Oct-2011
  • 254.  The order of supplying the data determines where it is placed in the BST , which determines the shape of the BST  Create BSTs from the same set of data presented each time in a different order: a) 17 4 14 19 15 7 9 3 16 10 b) 9 10 17 4 3 7 14 16 15 19 c) 19 17 16 15 14 10 9 7 4 3 can you guess this shape? BST Shapes ©RohitBirlaDataStructure RevisionTutorial 255 15-Oct-2011
  • 255.  removes a specified item from the BST and adjusts the tree  uses a binary search to locate the target item:  starting at the root it probes down the tree till it finds the target or reaches a leaf node (target not in the tree) removal of a node must not leave a ‘gap’ in the tree, BST Operations: Removal ©RohitBirlaDataStructure RevisionTutorial 256 15-Oct-2011
  • 256. method remove (key) I if the tree is empty return false II Attempt to locate the node containing the target using the binary search algorithm if the target is not found return false else the target is found, so remove its node: Case 1: if the node has 2 empty subtrees replace the link in the parent with null Case 2: if the node has a left and a right subtree - replace the node's value with the max value in the left subtree - delete the max node in the left subtree Removal in BST - Pseudocode ©RohitBirlaDataStructure RevisionTutorial 257 15-Oct-2011
  • 257. Case 3: if the node has no left child - link the parent of the node - to the right (non-empty) subtree Case 4: if the node has no right child - link the parent of the target - to the left (non-empty) subtree Removal in BST - Pseudocode ©RohitBirlaDataStructure RevisionTutorial 258 15-Oct-2011
  • 258. 9 7 5 64 8 10 9 7 5 6 8 10 Case 1: removing a node with 2 EMPTY SUBTREES parent cursor Removal in BST: Example Removing 4 replace the link in the parent with null ©RohitBirlaDataStructure RevisionTutorial 259 15-Oct-2011
  • 259. Case 2: removing a node with 2 SUBTREES 9 7 5 6 8 10 9 6 5 8 10 cursor cursor - replace the node's value with the max value in the left subtree - delete the max node in the left subtree 44 Removing 7 Removal in BST: Example What other element can be used as replacement? ©RohitBirlaDataStructure RevisionTutorial 260 15-Oct-2011
  • 260. 9 7 5 6 8 10 9 7 5 6 8 10 cursor cursor parent parent the node has no left child: link the parent of the node to the right (non-empty) subtree Case 3: removing a node with 1 EMPTY SUBTREE Removal in BST: Example ©RohitBirlaDataStructure RevisionTutorial 261 15-Oct-2011
  • 261. 9 7 5 8 10 9 7 5 8 10 cursor cursor parent parent the node has no right child: link the parent of the node to the left (non-empty) subtree Case 4: removing a node with 1 EMPTY SUBTREE Removing 5 4 4 Removal in BST: Example ©RohitBirlaDataStructure RevisionTutorial 262 15-Oct-2011
  • 262. The complexity of operations get, insert and remove in BST is O(h) , where h is the height. O(log n) when the tree is balanced. The updating operations cause the tree to become unbalanced. The tree can degenerate to a linear shape and the operations will become O (n) Analysis of BST Operations ©RohitBirlaDataStructure RevisionTutorial 263 15-Oct-2011
  • 263. BST tree = new BST(); tree.insert ("E"); tree.insert ("C"); tree.insert ("D"); tree.insert ("A"); tree.insert ("H"); tree.insert ("F"); tree.insert ("K"); >>>> Items in advantageous order: K H F E D C A Output: Best Case ©RohitBirlaDataStructure RevisionTutorial 264 15-Oct-2011
  • 264. BST tree = new BST(); for (int i = 1; i <= 8; i++) tree.insert (i); >>>> Items in worst order: 8 7 6 5 4 3 2 1 Output: Worst Case ©RohitBirlaDataStructure RevisionTutorial 265 15-Oct-2011
  • 265. tree = new BST (); for (int i = 1; i <= 8; i++) tree.insert(random()); >>>> Items in random order: X U P O H F B Output: Random Case ©RohitBirlaDataStructure RevisionTutorial 266 15-Oct-2011
  • 266. Applications for BST • Sorting with binary search trees • Input: unsorted array • Output: sorted array • Algorithm ? • Running time ? ©RohitBirlaDataStructure RevisionTutorial 267 15-Oct-2011
  • 267. Better Search Trees Prevent the degeneration of the BST : • A BST can be set up to maintain balance during updating operations (insertions and removals) • Types of ST which maintain the optimal performance: • splay trees • AVL trees • 2-4 Trees • Red-Black trees • B-trees ©RohitBirlaDataStructure RevisionTutorial 268 15-Oct-2011
  • 268. Trees: A Review (again? ) • General trees • one parent, N children • Binary tree • ISA General tree • + max 2 children • Binary search tree • ISA Binary tree • + left subtree < parent < right subtree • AVL tree • ISA Binary search tree • + | height left subtree – height right subtree |  1 ©RohitBirlaDataStructure RevisionTutorial 269 15-Oct-2011
  • 269. Trees: A Review (cont’d) • Multi-way search tree • ISA General tree • + Each node has K keys and K+1 children • + All keys in child K < key K < all keys in child K+1 • 2-4 Tree • ISA Multi-way search tree • + All nodes have at most 3 keys / 4 children • + All leaves are at the same level • B-Tree • ISA Multi-way search tree • + All nodes have at least T keys, at most 2T(+1) keys • + All leaves are at the same level ©RohitBirlaDataStructure RevisionTutorial 270 15-Oct-2011
  • 270. Tree Applications • Data Compression • Huffman tree • Automatic Learning • Decision trees ©RohitBirlaDataStructure RevisionTutorial 271 15-Oct-2011
  • 271. Huffman code • Very often used for text compression • Do you know how gzip or winzip works? •  Compression methods • ASCII code uses codes of equal length for all letters  how many codes? • Today’s alternative to ASCII? • Idea behind Huffman code: use shorter length codes for letters that are more frequent ©RohitBirlaDataStructure RevisionTutorial 272 15-Oct-2011
  • 272. Huffman Code • Build a list of letters and frequencies “have a great day today” • Build a Huffman Tree bottom up, by grouping letters with smaller occurrence frequencies ©RohitBirlaDataStructure RevisionTutorial 273 15-Oct-2011
  • 273. Huffman Codes • Write the Huffman codes for the strings • “abracadabra” • “Veni Vidi Vici” ©RohitBirlaDataStructure RevisionTutorial 274 15-Oct-2011
  • 274. Huffman Code • Running time? • Suppose N letters in input string, with L unique letters • What is the most important factor for obtaining highest compression? • Compare: [assume a text with a total of 1000 characters] • I. Three different characters, each occurring the same number of times • II. 20 different characters, 19 of them occurring only once, and the 20st occurring the rest of the time ©RohitBirlaDataStructure RevisionTutorial 275 15-Oct-2011
  • 275. Huffman Coding Trees ASCII codes: 8 bits per character. • Fixed-length coding. Can take advantage of relative frequency of letters to save space. • Variable-length coding Build the tree with minimum external path weight. Z K F C U D L E 2 7 24 32 37 42 42 120 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 276
  • 276. Huffman Tree Construction (1) 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 277
  • 277. Huffman Tree Construction (2) 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 278
  • 278. Assigning Codes Letter Freq Code Bits C 32 D 42 E 120 F 24 K 7 L 42 U 37 Z 2 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 279
  • 279. Coding and Decoding A set of codes is said to meet the prefix property if no code in the set is the prefix of another. Code for DEED: Decode 1011001110111101: Expected cost per letter: 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 280
  • 280. One More Application • Heuristic Search • Decision Trees • Given a set of examples, with an associated decision (e.g. good/bad, +/-, pass/fail, caseI/caseII/caseIII, etc.) • Attempt to take (automatically) a decision when a new example is presented • Predict the behavior in new cases! ©RohitBirlaDataStructure RevisionTutorial 281 15-Oct-2011
  • 281. Data Records Name A B C D E F G 1. Jeffrey B. 1 0 1 0 1 0 1 - 2. Paul S. 0 1 1 0 0 0 1 - 3. Daniel C. 0 0 1 0 0 0 0 - 4. Gregory P. 1 0 1 0 1 0 0 - 5. Michael N. 0 0 1 1 0 0 0 - 6. Corinne N. 1 1 1 0 1 0 1 + 7. Mariyam M. 0 1 0 1 0 0 1 + 8. Stephany D. 1 1 1 1 1 1 1 + 9. Mary D. 1 1 1 1 1 1 1 + 10. Jamie F. 1 1 1 0 0 1 1 + ©RohitBirlaDataStructure RevisionTutorial 282 15-Oct-2011
  • 282. Fields in the Record A: First name ends in a vowel? B: Neat handwriting? C: Middle name listed? D: Senior? E: Got extra-extra credit? F: Google brings up home page? G: Google brings up reference? ©RohitBirlaDataStructure RevisionTutorial 283 15-Oct-2011
  • 283. Build a Classification Tree Internal nodes: features Leaves: classification F A D A 0 1 8,9 2,3,7 1,4,5,6 10 Error: 30% ©RohitBirlaDataStructure RevisionTutorial 284 15-Oct-2011
  • 284. Different Search Problem Given a set of data records with their classifications, pick a decision tree: search problem! Challenges: • Scoring function? • Large space of trees. What’s a good tree? • Low error on given set of records • Small ©RohitBirlaDataStructure RevisionTutorial 285 15-Oct-2011
  • 285. “Perfect” Decision Tree C E B 0 1 F middle name? EEC? Neat?Google? Training set Error: 0% (can always do this?) 0 0 0 1 1 1 ©RohitBirlaDataStructure RevisionTutorial 286 15-Oct-2011
  • 286. Search For a Classification • Classify new records New1. Mike M. 1 0 1 1 0 0 1 ? New2. Jerry K. 0 1 0 1 0 0 0 ? ©RohitBirlaDataStructure RevisionTutorial 287 15-Oct-2011
  • 287. Heaps • A heap is a binary tree T that stores a key-element pairs at its internal nodes • It satisfies two properties: • MinHeap: key(parent)  key(child) • [OR MaxHeap: key(parent)  key(child)] • all levels are full, except the last one, which is left-filled 4 6 207 811 5 9 1214 15 2516 ©RohitBirlaDataStructure RevisionTutorial 288 15-Oct-2011
  • 288. What are Heaps Useful for? • To implement priority queues • Priority queue = a queue where all elements have a “priority” associated with them • Remove in a priority queue removes the element with the smallest priority • insert • removeMin ©RohitBirlaDataStructure RevisionTutorial 289 15-Oct-2011
  • 289. Heap or Not a Heap? ©RohitBirlaDataStructure RevisionTutorial 290 15-Oct-2011
  • 290. Heap Properties • A heap T storing n keys has height h = log(n + 1), which is O(log n) 4 6 207 811 5 9 1214 15 2516 ©RohitBirlaDataStructure RevisionTutorial 291 15-Oct-2011
  • 291. ADT for Min Heap objects: n > 0 elements organized in a binary tree so that the value in each node is at least as large as those in its children method: Heap Create(MAX_SIZE)::= create an empty heap that can hold a maximum of max_size elements Boolean HeapFull(heap, n)::= if (n==max_size) return TRUE else return FALSE Heap Insert(heap, item, n)::= if (!HeapFull(heap,n)) insert item into heap and return the resulting heap else return error Boolean HeapEmpty(heap, n)::= if (n>0) return FALSE else return TRUE Element Delete(heap,n)::= if (!HeapEmpty(heap,n)) return one instance of the smallest element in the heap and remove it from the heap else return error ©RohitBirlaDataStructure RevisionTutorial 292 15-Oct-2011
  • 292. Heap Insertion • Insert 6 ©RohitBirlaDataStructure RevisionTutorial 293 15-Oct-2011
  • 293. Heap Insertion • Add key in next available position ©RohitBirlaDataStructure RevisionTutorial 294 15-Oct-2011
  • 294. Heap Insertion • Begin Unheap ©RohitBirlaDataStructure RevisionTutorial 295 15-Oct-2011
  • 295. Heap Insertion ©RohitBirlaDataStructure RevisionTutorial 296 15-Oct-2011
  • 296. Heap Insertion • Terminate unheap when • reach root • key child is greater than key parent ©RohitBirlaDataStructure RevisionTutorial 297 15-Oct-2011
  • 297. Heap Removal • Remove element from priority queues? removeMin( ) ©RohitBirlaDataStructure RevisionTutorial 298 15-Oct-2011
  • 298. Heap Removal • Begin downheap ©RohitBirlaDataStructure RevisionTutorial 299 15-Oct-2011
  • 299. Heap Removal ©RohitBirlaDataStructure RevisionTutorial 300 15-Oct-2011
  • 300. Heap Removal ©RohitBirlaDataStructure RevisionTutorial 301 15-Oct-2011
  • 301. Heap Removal • Terminate downheap when • reach leaf level • key parent is greater than key child ©RohitBirlaDataStructure RevisionTutorial 302 15-Oct-2011
  • 302. Building a Heap • build (n + 1)/2 trivial one-element heaps • build three-element heaps on top of them ©RohitBirlaDataStructure RevisionTutorial 303 15-Oct-2011
  • 303. Building a Heap  downheap to preserve the order property  now form seven-element heaps ©RohitBirlaDataStructure RevisionTutorial 304 15-Oct-2011
  • 304. Building a Heap ©RohitBirlaDataStructure RevisionTutorial 305 15-Oct-2011
  • 305. Building a Heap ©RohitBirlaDataStructure RevisionTutorial 306 15-Oct-2011
  • 306. Heap Implementation • Using arrays • Parent = k ; Children = 2k , 2k+1 • Why is it efficient? [4] 6 12 7 1918 9 6 9 7 10 30 31 [1] [2] [3] [5] [6] [1] [2] [3] [4] [1] [2] ©RohitBirlaDataStructure RevisionTutorial 307 15-Oct-2011
  • 307. Insertion into a Heap void insertHeap(element item, int *n) { int i; if (HEAP_FULL(*n)) { fprintf(stderr, “the heap is full.n”); exit(1); } i = ++(*n); while ((i!=1)&&(item.key>heap[i/2].key)) { heap[i] = heap[i/2]; i /= 2; } heap[i]= item; } 2k-1=n ==> k=log2(n+1) O(log2n) ©RohitBirlaDataStructure RevisionTutorial 308 15-Oct-2011
  • 308. Deletion from a Heap element deleteHeap(int *n) { int parent, child; element item, temp; if (HEAP_EMPTY(*n)) { fprintf(stderr, “The heap is emptyn”); exit(1); } /* save value of the element with the highest key */ item = heap[1]; /* use last element in heap to adjust heap */ temp = heap[(*n)--]; parent = 1; child = 2; ©RohitBirlaDataStructure RevisionTutorial 309 15-Oct-2011
  • 309. while (child <= *n) { /* find the larger child of the current parent */ if ((child < *n)&& (heap[child].key<heap[child+1].key)) child++; if (temp.key >= heap[child].key) break; /* move to the next lower level */ heap[parent] = heap[child]; child *= 2; } heap[parent] = temp; return item; } Deletion from a Heap (cont’d) ©RohitBirlaDataStructure RevisionTutorial 310 15-Oct-2011
  • 310. Heap Sorting • Step 1: Build a heap • Step 2: removeMin( ) • Running time? ©RohitBirlaDataStructure RevisionTutorial 311 15-Oct-2011
  • 311. Sorting with BST • Use binary search trees for sorting • Start with unsorted sequence • Insert all elements in a BST • Traverse the tree…. how ? • Running time? 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 312
  • 312. Prevent the degeneration of the BST : • A BST can be set up to maintain balance during updating operations (insertions and removals) • Types of BST which maintain the optimal performance: • splay trees • AVL trees • Red-Black trees • B-trees Better Binary Search Trees 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 313
  • 313. AVL Trees • Balanced binary search trees • An AVL Tree is a binary search tree such that for every internal node v of T, the heights of the children of v can differ by at most 1. 88 44 17 78 32 50 48 62 2 4 1 1 2 3 1 1 15-Oct-2011 ©Rohit Birla Data Structure Revision Tutorial 314
  • 314. Height of an AVL Tree • Proposition: The height of an AVL tree T storing n keys is O(log n). • Justification: The easiest way to approach this problem is to find n(h): the minimum number of internal nodes of an AVL tree of height h. • n(1) = 1 and n(2) = 2 • for n ≥ 3, an AVL tree of height h contains the root node, one AVL subtree of height n-1 and the other AVL subtree of height n-2. •  n(h) = 1 + n(h-1) + n(h-2) • given n(h-1) > n(h-2)  n(h) > 2n(h-2) n(h) > 2n(h-2) n(h) > 4n(h-4) … n(h) > 2in(h-2i) • pick i = h/2 – 1  n(h) ≥ 2 h/2-1 • follow h < 2log n(h) +2 •  height of an AVL tree is O(log n) 15-Oct-2011 ©Rohit Birla Data Structure Revision Tutorial 315
  • 315. Insertion • A binary search tree T is called balanced if for every node v, the height of v’s children differ by at most one. • Inserting a node into an AVL tree involves performing an expandExternal(w) on T, which changes the heights of some of the nodes in T. • If an insertion causes T to become unbalanced, we travel up the tree from the newly created node until we find the first node x such that its grandparent z is unbalanced node. • Since z became unbalanced by an insertion in the subtree rooted at its child y, height(y) = height(sibling(y)) + 2 • Need to rebalance... 15-Oct-2011 ©Rohit Birla Data Structure Revision Tutorial 316
  • 316. Insertion: Rebalancing • To rebalance the subtree rooted at z, we must perform a restructuring • we rename x, y, and z to a, b, and c based on the order of the nodes in an in-order traversal. • z is replaced by b, whose children are now a and c whose children, in turn, consist of the four other subtrees formerly children of x, y, and z. 15-Oct-2011 ©Rohit Birla Data Structure Revision Tutorial 317
  • 317. Insertion (cont’d) 88 44 17 78 32 50 48 62 2 5 1 1 3 4 2 1 54 1 T0 T2 T3 x y z 2 3 4 5 6 7 1 88 44 17 7832 50 48 62 2 4 1 1 2 2 3 1 54 1 T0 T1 T2 T3 x y z unbalanced... ...balanced 1 2 3 4 5 6 7 15-Oct-2011 ©Rohit Birla Data Structure Revision Tutorial 318
  • 318. Restructuring • The four ways to rotate nodes in an AVL tree, graphically represented -Single Rotations: T0 T1 T2 T3 c = x b = y a = z T0 T1 T2 T3 c = x b = y a = z single rotation T3 T2 T1 T0 a = x b = y c = z T0T1T2 T3 a = x b = y c = z single rotation 15-Oct-2011 ©Rohit Birla Data Structure Revision Tutorial 319
  • 319. Restructuring (cont’d) • double rotations: double rotationa = z b = x c = y T0 T2 T1 T3 T0 T2 T3T1 a = z b = x c = y double rotationc = z b = x a = y T0 T2 T1 T3 T0 T2 T3 T1 c = z b = x a = y 15-Oct-2011 ©Rohit Birla Data Structure Revision Tutorial 320
  • 320. Restructure Algorithm Algorithm restructure(x): Input: A node x of a binary search tree T that has both a parent y and a grandparent z Output: Tree T restructured by a rotation (either single or double) involving nodes x, y, and z. 1: Let (a, b, c) be an inorder listing of the nodes x, y, and z, and let (T0, T1, T2, T3) be an inorder listing of the the four subtrees of x, y, and z, not rooted at x, y, or z. 2. Replace the subtree rooted at z with a new subtree rooted at b 3. Let a be the left child of b and let T0, T1 be the left and right subtrees of a, respectively. 4. Let c be the right child of b and let T2, T3 be the left and right subtrees of c, respectively. 15-Oct-2011 ©Rohit Birla Data Structure Revision Tutorial 321
  • 321. Cut/Link Restructure Algorithm • Let’s go into a little more detail on this algorithm... • Any tree that needs to be balanced can be grouped into 7 parts: x, y, z, and the 4 trees anchored at the children of those nodes (T0-3) 88 44 17 7850 48 62 54T0 T1 T2 T3 y x 15-Oct-2011 ©Rohit Birla Data Structure Revision Tutorial 322
  • 322. Cut/Link Restructure Algorithm 88 44 17 7850 48 62 54T0 T1 T2 T3 y x • Make a new tree which is balanced and put the 7 parts from the old tree into the new tree so that the numbering is still correct when we do an in-order-traversal of the new tree. • This works regardless of how the tree is originally unbalanced. • Let’s see how it works! 15-Oct-2011 ©Rohit Birla Data Structure Revision Tutorial 323
  • 323. • Number the 7 parts by doing an in-order-traversal. (note that x,y, and z are now renamed based upon their order within the traversal) 88 44 17 7850 48 62 54T0 T1 T2 T3 z (a) y (b) x (c) 1 2 3 4 5 6 7 Cut/Link Restructure Algorithm 15-Oct-2011 ©Rohit Birla Data Structure Revision Tutorial 324
  • 324. • Now create an Array, numbered 1 to 7 (the 0th element can be ignored with minimal waste of space) 1 2 3 4 5 6 7 •Cut() the 4 T trees and place them in their inorder rank in the array T0 T1 T2 T3 1 2 3 4 5 6 7 Cut/Link Restructure Algorithm 15-Oct-2011 ©Rohit Birla Data Structure Revision Tutorial 325
  • 325. • Now cut x,y, and z in that order (child,parent,grandparent) and place them in their inorder rank in the array. T0 T1 T2 T378 c 62 ba 44 62 b 4 1 2 3 4 5 6 7 •Now we can re-link these subtrees to the main tree. •Link in rank 4 (b) where the subtree’s root formerly Cut/Link Restructure Algorithm 15-Oct-2011 ©Rohit Birla Data Structure Revision Tutorial 326
  • 326. • Link in ranks 2 (a) and 6 (c) as 4’s children. 62 b 4 44 78 a c 2 6 Cut/Link Restructure Algorithm 15-Oct-2011 ©Rohit Birla Data Structure Revision Tutorial 327
  • 327. • Finally, link in ranks 1,3,5, and 7 as the children of 2 and 6. 62 y 4 44 78 z x 17 T0 2 6 50 48 54 T1 3 5 88 T3 7 T2 • Now you have a balanced tree! Cut/Link Restructure Algorithm 15-Oct-2011 ©Rohit Birla Data Structure Revision Tutorial 328
  • 328. • This algorithm for restructuring has the exact same effect as using the four rotation cases discussed earlier. • Advantages: no case analysis, more elegant • Disadvantage: can be more code to write • Same time complexity Cut/Link Restructure Algorithm 15-Oct-2011 ©Rohit Birla Data Structure Revision Tutorial 329
  • 329. Removal • We can easily see that performing a removeAboveExternal(w) can cause T to become unbalanced. • Let z be the first unbalanced node encountered while traveling up the tree from w. Also, let y be the child of z with the larger height, and let x be the child of y with the larger height. • We can perform operation restructure(x) to restore balance at the subtree rooted at z. • As this restructuring may upset the balance of another node higher in the tree, we must continue checking for balance until the root of T is reached 15-Oct-2011 ©Rohit Birla Data Structure Revision Tutorial 330
  • 330. Removal (cont’d) • example of deletion from an AVL tree: 88 44 17 78 32 50 48 62 1 4 1 2 2 3 1 54 1 T0 T T 2 y x 0 1 8817 78 50 48 62 1 1 2 2 3 1 54 1 T 0 T 2 T 3 y x 44 4 z 0 15-Oct-2011 ©Rohit Birla Data Structure Revision Tutorial 331
  • 331. Removal (cont’d) example of deletion from an AVL tree: 88 17 78 50 48 62 1 1 4 2 3 1 54 1 T 0 T 1 T 2 y x 0 44 2 z 88 44 17 78 32 50 48 62 1 4 1 2 2 3 1 54 1 T0 T1 T2 T3 z y x 0 15-Oct-2011 ©Rohit Birla Data Structure Revision Tutorial 332
  • 332. Indexing Goals: • Store large files • Support multiple search keys • Support efficient insert, delete, and range queries 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 333
  • 333. Terms(1) Entry sequenced file: Order records by time of insertion. • Search with sequential search Index file: Organized, stores pointers to actual records. • Could be organized with a tree or other data structure. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 334
  • 334. Terms(2) Primary Key: A unique identifier for records. May be inconvenient for search. Secondary Key: An alternate search key, often not unique for each record. Often used for search key. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 335
  • 335. Linear Indexing Linear index: Index file organized as a simple sequence of key/record pointer pairs with key values are in sorted order. Linear indexing is good for searching variable-length records. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 336
  • 336. Linear Indexing (2) If the index is too large to fit in main memory, a second-level index might be used. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 337
  • 337. Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: • Insert/delete • Multiple search keys (multiple indices) • Key range search 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 338
  • 338. Tree Indexing (2) Difficulties when storing tree index on disk: • Tree must be balanced. • Each path from root to leaf should cover few disk pages. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 339
  • 339. 2-3 Tree (1) A 2-3 Tree has the following properties: 1. A node contains one or two keys 2. Every internal node has either two children (if it contains one key) or three children (if it contains two keys). 3. All leaves are at the same level in the tree, so the tree is always height balanced. The 2-3 Tree has a search tree property analogous to the BST. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 340
  • 340. 2-3 Tree (2) The advantage of the 2-3 Tree over the BST is that it can be updated at low cost. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 341
  • 341. 2-3 Tree Insertion (1) 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 342
  • 342. 2-3 Tree Insertion (2) 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 343
  • 343. 2-3 Tree Insertion (3) 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 344
  • 344. B-Trees (1) The B-Tree is an extension of the 2-3 Tree. The B-Tree is now the standard file organization for applications requiring insertion, deletion, and key range searches. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 345
  • 345. B-Trees (2) 1. B-Trees are always balanced. 2. B-Trees keep similar-valued records together on a disk page, which takes advantage of locality of reference. 3. B-Trees guarantee that every node in the tree will be full at least to a certain minimum percentage. This improves space efficiency while reducing the typical number of disk fetches necessary during a search or update operation. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 346
  • 346. B-Tree Definition A B-Tree of order m has these properties: • The root is either a leaf or has at least two children. • Each node, except for the root and the leaves, has between m/2 and m children. • All leaves are at the same level in the tree, so the tree is always height balanced. A B-Tree node is usually selected to match the size of a disk block. • A B-Tree node could have hundreds of children. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 347
  • 347. B-Tree Search (1) Search in a B-Tree is a generalization of search in a 2-3 Tree. 1. Do binary search on keys in current node. If search key is found, then return record. If current node is a leaf node and key is not found, then report an unsuccessful search. 2. Otherwise, follow the proper branch and repeat the process. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 348
  • 348. B+-Trees The most commonly implemented form of the B- Tree is the B+-Tree. Internal nodes of the B+-Tree do not store record -- only key values to guild the search. Leaf nodes store records or pointers to records. A leaf node may store more or less records than an internal node stores keys. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 349
  • 349. B+-Tree Example 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 350
  • 350. B+-Tree Insertion 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 351
  • 351. B+-Tree Deletion (1) 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 352
  • 352. B+-Tree Deletion (2) 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 353
  • 353. B+-Tree Deletion (3) 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 354
  • 354. B-Tree Space Analysis (1) B+-Trees nodes are always at least half full. The B*-Tree splits two pages for three, and combines three pages into two. In this way, nodes are always 2/3 full. Asymptotic cost of search, insertion, and deletion of nodes from B- Trees is (log n). • Base of the log is the (average) branching factor of the tree. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 355
  • 355. B-Tree Space Analysis (2) Example: Consider a B+-Tree of order 100 with leaf nodes containing 100 records. 1 level B+-tree: 2 level B+-tree: 3 level B+-tree: 4 level B+-tree: Ways to reduce the number of disk fetches: • Keep the upper levels in memory. • Manage B+-Tree pages with a buffer pool. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 356
  • 356. Graph Terminology ©RohitBirlaDataStructure RevisionTutorial 357 15-Oct-2011
  • 357. Paths and cycles • A path is a sequence of nodes v1, v2, …, vN such that (vi,vi+1)E for 0<i<N • The length of the path is N-1. • Simple path: all vi are distinct, 0<i<N • A cycle is a path such that v1=vN • An acyclic graph has no cycles ©RohitBirlaDataStructure RevisionTutorial 358 15-Oct-2011
  • 358. Cycles PIT BOS JFK DTW LAX SFO ©RohitBirlaDataStructure RevisionTutorial 359 15-Oct-2011
  • 359. More useful definitions • In a directed graph: • The indegree of a node v is the number of distinct edges (w,v)E. • The outdegree of a node v is the number of distinct edges (v,w)E. • A node with indegree 0 is a root. ©RohitBirlaDataStructure RevisionTutorial 360 15-Oct-2011
  • 360. Trees are graphs • A dag is a directed acyclic graph. • A tree is a connected acyclic undirected graph. • A forest is an acyclic undirected graph (not necessarily connected), i.e., each connected component is a tree. ©RohitBirlaDataStructure RevisionTutorial 361 15-Oct-2011
  • 361. Example DAG Watch Socks Shoes Undershorts Pants Belt Tie Shirt Jacket a DAG implies an ordering on events ©RohitBirlaDataStructure RevisionTutorial 362 15-Oct-2011
  • 362. Example DAG Watch Socks Shoes Undershorts Pants Belt Tie Shirt Jacket In a complex DAG, it can be hard to find a schedule that obeys all the constraints. ©RohitBirlaDataStructure RevisionTutorial 363 15-Oct-2011
  • 363. Topological Sort ©RohitBirlaDataStructure RevisionTutorial 364 15-Oct-2011
  • 364. Topological Sort • For a directed acyclic graph G = (V,E) • A topological sort is an ordering of all of G’s vertices v1, v2, …, vn such that... Formally: for every edge (vi,vk) in E, i<k. Visually: all arrows are pointing to the right ©RohitBirlaDataStructure RevisionTutorial 365 15-Oct-2011
  • 365. Topological sort • There are often many possible topological sorts of a given DAG • Topological orders for this DAG : • 1,2,5,4,3,6,7 • 2,1,5,4,7,3,6 • 2,5,1,4,7,3,6 • Etc. • Each topological order is a feasible schedule. 1 4 76 3 5 2 ©RohitBirlaDataStructure RevisionTutorial 366 15-Oct-2011
  • 366. Topological Sorts for Cyclic Graphs? Impossible! 1 2 3 • If v and w are two vertices on a cycle, there exist paths from v to w and from w to v. • Any ordering will contradict one of these paths ©RohitBirlaDataStructure RevisionTutorial 367 15-Oct-2011
  • 367. Topological sort algorithm • Algorithm • Assume indegree is stored with each node. • Repeat until no nodes remain: • Choose a root and output it. • Remove the root and all its edges. • Performance • O(V2 + E), if linear search is used to find a root. ©RohitBirlaDataStructure RevisionTutorial 368 15-Oct-2011
  • 368. Better topological sort • Algorithm: • Scan all nodes, pushing roots onto a stack. • Repeat until stack is empty: • Pop a root r from the stack and output it. • For all nodes n such that (r,n) is an edge, decrement n’s indegree. If 0 then push onto the stack. • O( V + E ), so still O(V2) in worst case, but better for sparse graphs. • Q: Why is this algorithm correct? ©RohitBirlaDataStructure RevisionTutorial 369 15-Oct-2011
  • 369. Correctness • Clearly any ordering produced by this algorithm is a topological order But... • Does every DAG have a topological order, and if so, is this algorithm guaranteed to find one? ©RohitBirlaDataStructure RevisionTutorial 370 15-Oct-2011
  • 370. Quiz Break ©RohitBirlaDataStructure RevisionTutorial 371 15-Oct-2011
  • 371. Quiz • Prove: • This algorithm never gets stuck, i.e. if there are unvisited nodes then at least one of them has an indegree of zero. • Hint: • Prove that if at any point there are unseen vertices but none of them have an indegree of 0, a cycle must exist, contradicting our assumption of a DAG. ©RohitBirlaDataStructure RevisionTutorial 372 15-Oct-2011
  • 372. Graph Traversals ©RohitBirlaDataStructure RevisionTutorial 373 15-Oct-2011
  • 373. Graph Traversals •Both take time: O(V+E) ©RohitBirlaDataStructure RevisionTutorial 374 15-Oct-2011
  • 374. Use of a stack • It is very common to use a stack to keep track of: • nodes to be visited next, or • nodes that we have already visited. • Typically, use of a stack leads to a depth-first visit order. • Depth-first visit order is “aggressive” in the sense that it examines complete paths. ©RohitBirlaDataStructure RevisionTutorial 375 15-Oct-2011
  • 375. Topological Sort as DFS • do a DFS of graph G • as each vertex v is “finished” (all of it’s children processed), insert it onto the front of a linked list • return the linked list of vertices • why is this correct? ©RohitBirlaDataStructure RevisionTutorial 376 15-Oct-2011
  • 376. Use of a queue • It is very common to use a queue to keep track of: • nodes to be visited next, or • nodes that we have already visited. • Typically, use of a queue leads to a breadth-first visit order. • Breadth-first visit order is “cautious” in the sense that it examines every path of length i before going on to paths of length i+1. ©RohitBirlaDataStructure RevisionTutorial 377 15-Oct-2011
  • 377. Graph Searching ??? • Graph as state space (node = state, edge = action) • For example, game trees, mazes, ... • BFS and DFS each search the state space for a best move. If the search is exhaustive they will find the same solution, but if there is a time limit and the search space is large... • DFS explores a few possible moves, looking at the effects far in the future • BFS explores many solutions but only sees effects in the near future (often finds shorter solutions) ©RohitBirlaDataStructure RevisionTutorial 378 15-Oct-2011
  • 378. Minimum Spanning Trees ©RohitBirlaDataStructure RevisionTutorial 379 15-Oct-2011
  • 379. Problem: Laying Telephone Wire Central office ©RohitBirlaDataStructure RevisionTutorial 380 15-Oct-2011
  • 380. Wiring: Naïve Approach Central office Expensive! ©RohitBirlaDataStructure RevisionTutorial 381 15-Oct-2011
  • 381. Wiring: Better Approach Central office Minimize the total length of wire connecting the customers ©RohitBirlaDataStructure RevisionTutorial 382 15-Oct-2011
  • 382. Minimum Spanning Tree (MST)(see Weiss, Section 24.2.2) • it is a tree (i.e., it is acyclic) • it covers all the vertices V – contains |V| - 1 edges • the total cost associated with tree edges is the minimum among all possible spanning trees • not necessarily unique A minimum spanning tree is a subgraph of an undirected weighted graph G, such that ©RohitBirlaDataStructure RevisionTutorial 383 15-Oct-2011
  • 383. How Can We Generate a MST? a c e d b 2 45 9 6 4 5 5 a c e d b 2 45 9 6 4 5 5 ©RohitBirlaDataStructure RevisionTutorial 385 15-Oct-2011
  • 384. Prim’s Algorithm Initialization a. Pick a vertex r to be the root b. Set D(r) = 0, parent(r) = null c. For all vertices v  V, v  r, set D(v) =  d. Insert all vertices into priority queue P, using distances as the keys a c e d b 2 45 9 6 4 5 5 e a b c d 0     Vertex Parent e - ©RohitBirlaDataStructure RevisionTutorial 386 15-Oct-2011
  • 385. Prim’s Algorithm While P is not empty: 1. Select the next vertex u to add to the tree u = P.deleteMin() 2. Update the weight of each vertex w adjacent to u which is not in the tree (i.e., w  P) If weight(u,w) < D(w), a. parent(w) = u b. D(w) = weight(u,w) c. Update the priority queue to reflect new distance for w ©RohitBirlaDataStructure RevisionTutorial 387 15-Oct-2011
  • 386. Prim’s algorithm a c e d b 2 45 9 6 4 5 5 d b c a 4 5 5  Vertex Parent e - b e c e d e The MST initially consists of the vertex e, and we update the distances and parent for its adjacent vertices Vertex Parent e - b - c - d - d b c a     e 0 ©RohitBirlaDataStructure RevisionTutorial 388 15-Oct-2011
  • 387. Prim’s algorithm a c e d b 2 45 9 6 4 5 5 a c b 2 4 5 Vertex Parent e - b e c d d e a d d b c a 4 5 5  Vertex Parent e - b e c e d e ©RohitBirlaDataStructure RevisionTutorial 389 15-Oct-2011
  • 388. Prim’s algorithm a c e d b 2 45 9 6 4 5 5 c b 4 5 Vertex Parent e - b e c d d e a d a c b 2 4 5 Vertex Parent e - b e c d d e a d ©RohitBirlaDataStructure RevisionTutorial 390 15-Oct-2011
  • 389. Prim’s algorithm a c e d b 2 45 9 6 4 5 5 b 5 Vertex Parent e - b e c d d e a d c b 4 5 Vertex Parent e - b e c d d e a d ©RohitBirlaDataStructure RevisionTutorial 391 15-Oct-2011
  • 390. Prim’s algorithm Vertex Parent e - b e c d d e a d a c e d b 2 45 9 6 4 5 5 The final minimum spanning tree b 5 Vertex Parent e - b e c d d e a d ©RohitBirlaDataStructure RevisionTutorial 392 15-Oct-2011
  • 391. Running time of Prim’s algorithm (without heaps) Initialization of priority queue (array): O(|V|) Update loop: |V| calls • Choosing vertex with minimum cost edge: O(|V|) • Updating distance values of unconnected vertices: each edge is considered only once during entire execution, for a total of O(|E|) updates Overall cost without heaps: When heaps are used, apply same analysis as for Dijkstra’s algorithm (p.469) (good exercise) O(|E| + |V| 2) ©RohitBirlaDataStructure RevisionTutorial 393 15-Oct-2011
  • 392. Prim’s Algorithm Invariant • At each step, we add the edge (u,v) s.t. the weight of (u,v) is minimum among all edges where u is in the tree and v is not in the tree • Each step maintains a minimum spanning tree of the vertices that have been included thus far • When all vertices have been included, we have a MST for the graph! ©RohitBirlaDataStructure RevisionTutorial 394 15-Oct-2011
  • 393. Correctness of Prim’s • This algorithm adds n-1 edges without creating a cycle, so clearly it creates a spanning tree of any connected graph (you should be able to prove this). But is this a minimum spanning tree? Suppose it wasn't. • There must be point at which it fails, and in particular there must a single edge whose insertion first prevented the spanning tree from being a minimum spanning tree. ©RohitBirlaDataStructure RevisionTutorial 395 15-Oct-2011
  • 394. Correctness of Prim’s • Let V' be the vertices incident with edges in S • Let T be a MST of G containing all edges in S, but not (x,y). • Let G be a connected, undirected graph • Let S be the set of edges chosen by Prim’s algorithm before choosing an errorful edge (x,y) x y ©RohitBirlaDataStructure RevisionTutorial 396 15-Oct-2011
  • 395. Correctness of Prim’s x y v w • There is exactly one edge on this cycle with exactly one vertex in V’, call this edge (v,w) • Edge (x,y) is not in T, so there must be a path in T from x to y since T is connected. • Inserting edge (x,y) into T will create a cycle ©RohitBirlaDataStructure RevisionTutorial 397 15-Oct-2011
  • 396. Correctness of Prim’s • Since Prim’s chose (x,y) over (v,w), w(v,w) >= w(x,y). • We could form a new spanning tree T’ by swapping (x,y) for (v,w) in T (prove this is a spanning tree). • w(T’) is clearly no greater than w(T) • But that means T’ is a MST • And yet it contains all the edges in S, and also (x,y) ...Contradiction ©RohitBirlaDataStructure RevisionTutorial 398 15-Oct-2011
  • 397. Another Approach a c e d b 2 45 9 6 4 5 5 • Create a forest of trees from the vertices • Repeatedly merge trees by adding “safe edges” until only one tree remains • A “safe edge” is an edge of minimum weight which does not create a cycle forest: {a}, {b}, {c}, {d}, {e} ©RohitBirlaDataStructure RevisionTutorial 399 15-Oct-2011
  • 398. Kruskal’s algorithm Initialization a. Create a set for each vertex v  V b. Initialize the set of “safe edges” A comprising the MST to the empty set c. Sort edges by increasing weight a c e d b 2 45 9 6 4 5 5 F = {a}, {b}, {c}, {d}, {e} A =  E = {(a,d), (c,d), (d,e), (a,c), (b,e), (c,e), (b,d), (a,b)} ©RohitBirlaDataStructure RevisionTutorial 400 15-Oct-2011
  • 399. Kruskal’s algorithm For each edge (u,v)  E in increasing order while more than one set remains: If u and v, belong to different sets U and V a. add edge (u,v) to the safe edge set A = A  {(u,v)} b. merge the sets U and V F = F - U - V + (U  V) Return A • Running time bounded by sorting (or findMin) • O(|E|log|E|), or equivalently, O(|E|log|V|) (why???) ©RohitBirlaDataStructure RevisionTutorial 401 15-Oct-2011
  • 400. Kruskal’s algorithm E = {(a,d), (c,d), (d,e), (a,c), (b,e), (c,e), (b,d), (a,b)} Forest {a}, {b}, {c}, {d}, {e} {a,d}, {b}, {c}, {e} {a,d,c}, {b}, {e} {a,d,c,e}, {b} {a,d,c,e,b} A  {(a,d)} {(a,d), (c,d)} {(a,d), (c,d), (d,e)} {(a,d), (c,d), (d,e), (b,e)} a c e d b 2 45 9 6 4 5 5 ©RohitBirlaDataStructure RevisionTutorial 402 15-Oct-2011
  • 401. • After each iteration, every tree in the forest is a MST of the vertices it connects • Algorithm terminates when all vertices are connected into one tree Kruskal’s Algorithm Invariant ©RohitBirlaDataStructure RevisionTutorial 403 15-Oct-2011
  • 402. Correctness of Kruskal’s • This algorithm adds n-1 edges without creating a cycle, so clearly it creates a spanning tree of any connected graph (you should be able to prove this). But is this a minimum spanning tree? Suppose it wasn't. • There must be point at which it fails, and in particular there must a single edge whose insertion first prevented the spanning tree from being a minimum spanning tree. ©RohitBirlaDataStructure RevisionTutorial 404 15-Oct-2011
  • 403. Correctness of Kruskal’s • Let e be this first errorful edge. • Let K be the Kruskal spanning tree • Let S be the set of edges chosen by Kruskal’s algorithm before choosing e • Let T be a MST containing all edges in S, but not e. K T S e ©RohitBirlaDataStructure RevisionTutorial 405 15-Oct-2011
  • 404. Correctness of Kruskal’s Proof (by contradiction): • Assume there exists some edge e’ in T - S, w(e’) < w(e) • Kruskal’s must have considered e’ before e K T S e Lemma: w(e’) >= w(e) for all edges e’ in T - S • However, since e’ is not in K (why??), it must have been discarded because it caused a cycle with some of the other edges in S. • But e’ + S is a subgraph of T, which means it cannot form a cycle ...Contradiction ©RohitBirlaDataStructure RevisionTutorial 406 15-Oct-2011
  • 405. Correctness of Kruskal’s • Inserting edge e into T will create a cycle • There must be an edge on this cycle which is not in K (why??). Call this edge e’ • e’ must be in T - S, so (by our lemma) w(e’) >= w(e) • We could form a new spanning tree T’ by swapping e for e’ in T (prove this is a spanning tree). • w(T’) is clearly no greater than w(T) • But that means T’ is a MST • And yet it contains all the edges in S, and also e ...Contradiction ©RohitBirlaDataStructure RevisionTutorial 407 15-Oct-2011
  • 406. Greedy Approach • Like Dijkstra’s algorithm, both Prim’s and Kruskal’s algorithms are greedy algorithms • The greedy approach works for the MST problem; however, it does not work for many other problems! ©RohitBirlaDataStructure RevisionTutorial 408 15-Oct-2011
  • 407. That’s All! ©RohitBirlaDataStructure RevisionTutorial 409 15-Oct-2011
  • 408. • The very last tree for this class ©Rohit Birla Data Structure Revision Tutorial 41015-Oct-2011
  • 409. Dictionaries • A dictionary is a collection of elements each of which has a unique search key • Uniqueness criteria may be relaxed (multiset) • (I.e. do not force uniqueness) • Keep track of current members, with periodic insertions and deletions into the set • Examples • Membership in a club, course records • Symbol table (contains duplicates) • Language dictionary (WordSmith, Webster, WordNet) • Similar to database ©RohitBirlaDataStructure RevisionTutorial 411 15-Oct-2011
  • 410. Course Records Dictionary Member Record key student name hw1 ...123 Stan Smith 49 ...124 Sue Margolin 56 ...125 Billie King 34 ...167 Roy Miller 39 ... ©RohitBirlaDataStructure RevisionTutorial 412 15-Oct-2011
  • 411. Dictionary ADT • simple container methods:size() isEmpty() elements() • query methods: findElement(k) findAllElements(k) • update methods: insertItem(k, e) removeElement(k) removeAllElements(k) • special element NO_SUCH_KEY, returned by an unsuccessful search ©RohitBirlaDataStructure RevisionTutorial 413 15-Oct-2011
  • 412. How to Implement a Dictionary? • Sequences / Arrays • ordered • unordered • Binary Search Trees • Skip lists • Hashtables ©RohitBirlaDataStructure RevisionTutorial 414 15-Oct-2011
  • 413. Recall Arrays … • Unordered array • searching and removing takes O(?) time • inserting takes O(?) time • applications to log files (frequent insertions, rare searches and removals) ©RohitBirlaDataStructure RevisionTutorial 415 15-Oct-2011
  • 414. • Ordered array • searching takes O(log n) time (binary search) • inserting and removing takes O(n) time • application to look-up tables (frequent searches, rare insertions and removals) • Apply binary search More Arrays ©RohitBirlaDataStructure RevisionTutorial 416 15-Oct-2011
  • 415. • narrow down the search range in stages • “high-low” game • findElement(22) Binary Searches ©RohitBirlaDataStructure RevisionTutorial 417 15-Oct-2011
  • 416. • Implement a dictionary with a BST • A binary search tree is a binary tree T such that • each internal node stores an item (k, e) of a dictionary. • keys stored at nodes in the left subtree of v are less than or equal to k. • keys stored at nodes in the right subtree of v are greater than or equal to k. Recall Binary Search Trees… ©RohitBirlaDataStructure RevisionTutorial 418 15-Oct-2011
  • 417. An Alternative to Arrays • Unordered Array: • insertion: O(1) • search: O(n) • Ordered Array • insertion: O(n) • search: O(log n) • Skip Lists: • insertion: O(log n) • search: O(log n) • And avoid the fixed-size drawback of arrays! ©RohitBirlaDataStructure RevisionTutorial 419 15-Oct-2011
  • 418. Skip Lists • good implementation for a dictionary • a series of lists ,S0, S1, …, Sk- • each list Si stores a sorted subset of the dictionary D 12 18 25 28 72 74-  18 25 74-  18-  -  S0 S1 S2 S3 ©RohitBirlaDataStructure RevisionTutorial 420 15-Oct-2011
  • 419. Skip Lists • list S(i+1) contains items picked at random from S(i) • each item has probability 50% of being in the upper level list • like flipping a coin • S0 has n elements • S1 has about n/2 elements • S2 has about n/4 elements • …. • S(i) has about ? elements ©RohitBirlaDataStructure RevisionTutorial 421 15-Oct-2011
  • 420. Traversing Positions in a Skip List • Assume a node P in the skip list • after(p) • before(p) • below(p) • above(p) • Running time of each operation? ©RohitBirlaDataStructure RevisionTutorial 422 15-Oct-2011
  • 421. Operations in a Skip List • Use skip lists to implement dictionaries  • Need to deal with • Search • Insert • Remove ©RohitBirlaDataStructure RevisionTutorial 423 15-Oct-2011
  • 422. Searching • Search for key K • Start with p = the top-most, left position node in the skip list • two steps: 1. if below(p) is null then stop • we are at the bottom 2. while key(p) < K move to the right go back to 1 ©RohitBirlaDataStructure RevisionTutorial 424 15-Oct-2011
  • 423. Searching • Search for 27 12 18 25 28 72 74-  18 25 74-  18-  -  S0 S1 S2 S3 ©RohitBirlaDataStructure RevisionTutorial 425 15-Oct-2011
  • 424. More Searching • Search for 74 12 18 25 28 72 74-  18 25 74-  18-  -  S0 S1 S2 S3 ©RohitBirlaDataStructure RevisionTutorial 426 15-Oct-2011
  • 425. Pseudocode for Searching Algorithm SkipSearch(k) Input: Search key k Output: Position p in S such that p has the largest key less than or equal to k p = top-most, left node in S while below(p) != null do p  below(p) while(key (after(p))  k do p  after(p) return p ©RohitBirlaDataStructure RevisionTutorial 427 15-Oct-2011
  • 426. Running Time Analysis • log n levels  O(log n) for going down in the skip list • at each level, O(1) for moving forward • why? works like a binary search • in skip lists, the elements in list S(i+1) play the role of search dividers for elements in S(i) • (in binary search: mid-list elements to divide the search) • total running time: O(log n) ©RohitBirlaDataStructure RevisionTutorial 428 15-Oct-2011
  • 427. Insertion in Skip Lists • First: identify the place to insert new key k •  node p in S0 with largest key less or equal than k • Insert new item(k,e) after p • with probability 50%, the new item is inserted in list S1 • with probability 25% , the new item is inserted in list S2 • with probability 12.5% , the new item is inserted in list S3 • with probability 6.25% , the new item is inserted in list S4 • …. ©RohitBirlaDataStructure RevisionTutorial 429 15-Oct-2011
  • 428. Insertion in Skip Lists • Insert 29 12 18 25 28 72 74-  18 25 74-  18-  -  S0 S1 S2 S3 29 29 ©RohitBirlaDataStructure RevisionTutorial 430 15-Oct-2011
  • 429. Pseudocode for Insertion Algorithm SkipInsert(k,e) Input: Item (k,e) Output: - p  SkipSearch(k) q  insertAfterAbove(p, null, Item (k,e)) while random( )  50% do while(above(p) == null) do p  before(p) p  above(p) q  insertAfterAbove(p, q, Item(k,e)) ©RohitBirlaDataStructure RevisionTutorial 431 15-Oct-2011
  • 430. Running Time for Insertion? • Search position for new item(k,e) • O(log n) • Insertion • O(1) • Total running time • O(log n) ©RohitBirlaDataStructure RevisionTutorial 432 15-Oct-2011
  • 431. Removal from Skip Lists • Easier than insertion • Locate item with key k to be removed • if no such element, return NO SUCH KEY • otherwise, remove Item(k,e) • remove all items found with above(Item(k,e)) ©RohitBirlaDataStructure RevisionTutorial 433 15-Oct-2011
  • 432. Removal from Skip Lists • Remove 18 • Running time? 12 18 25 28 72 74-  18 25 74-  18-  -  S0 S1 S2 S3 ©RohitBirlaDataStructure RevisionTutorial 434 15-Oct-2011
  • 433. Efficient Implementation of Skip Lists • use DoublyLinkedList implementation • + two additional pointers • above • below • For a LinkedList  provide pointer to head • For a DoublyLinkedList  provide pointers to head and tail • For a SkipList  ?? ©RohitBirlaDataStructure RevisionTutorial 435 15-Oct-2011
  • 434. How to Implement a Dictionary? • Sequences • ordered • unordered • Binary Search Trees • Skip lists • Hashtables ©RohitBirlaDataStructure RevisionTutorial 436 15-Oct-2011
  • 435. Hashing • Another important and widely useful technique for implementing dictionaries • Constant time per operation (on the average) • Worst case time proportional to the size of the set for each operation (just like array and chain implementation) ©RohitBirlaDataStructure RevisionTutorial 437 15-Oct-2011
  • 436. Basic Idea • Use hash function to map keys into positions in a hash table Ideally • If element e has key k and h is hash function, then e is stored in position h(k) of table • To search for e, compute h(k) to locate position. If no element, dictionary does not contain e. ©RohitBirlaDataStructure RevisionTutorial 438 15-Oct-2011
  • 437. Example • Dictionary Student Records • Keys are ID numbers (951000 - 952000), no more than 100 students • Hash function: h(k) = k-951000 maps ID into distinct table positions 0-1000 • array table[1001] ... 0 1 2 3 1000 hash table buckets ©RohitBirlaDataStructure RevisionTutorial 439 15-Oct-2011
  • 438. Analysis (Ideal Case) • O(b) time to initialize hash table (b number of positions or buckets in hash table) • O(1) time to perform insert, remove, search ©RohitBirlaDataStructure RevisionTutorial 440 15-Oct-2011
  • 439. Ideal Case is Unrealistic • Works for implementing dictionaries, but many applications have key ranges that are too large to have 1-1 mapping between buckets and keys! Example: • Suppose key can take on values from 0 .. 65,535 (2 byte unsigned int) • Expect  1,000 records at any given time • Impractical to use hash table with 65,536 slots! ©RohitBirlaDataStructure RevisionTutorial 441 15-Oct-2011
  • 440. Hash Functions • If key range too large, use hash table with fewer buckets and a hash function which maps multiple keys to same bucket: h(k1) =  = h(k2): k1 and k2 have collision at slot  • Popular hash functions: hashing by division h(k) = k%D, where D number of buckets in hash table • Example: hash table with 11 buckets h(k) = k%11 80  3 (80%11= 3), 40  7, 65  10 58  3 collision! ©RohitBirlaDataStructure RevisionTutorial 442 15-Oct-2011
  • 441. Collision Resolution Policies • Two classes: • (1) Open hashing, a.k.a. separate chaining • (2) Closed hashing, a.k.a. open addressing • Difference has to do with whether collisions are stored outside the table (open hashing) or whether collisions result in storing one of the records at another slot in the table (closed hashing) ©RohitBirlaDataStructure RevisionTutorial 443 15-Oct-2011
  • 442. Closed Hashing • Associated with closed hashing is a rehash strategy: “If we try to place x in bucket h(x) and find it occupied, find alternative location h1(x), h2(x), etc. Try each in order, if none empty table is full,” • h(x) is called home bucket • Simplest rehash strategy is called linear hashing hi(x) = (h(x) + i) % D • In general, our collision resolution strategy is to generate a sequence of hash table slots (probe sequence) that can hold the record; test each slot until find empty one (probing) ©RohitBirlaDataStructure RevisionTutorial 444 15-Oct-2011
  • 443. Example Linear (Closed) Hashing • D=8, keys a,b,c,d have hash values h(a)=3, h(b)=0, h(c)=4, h(d)=3 0 2 3 4 5 6 7 1 b a c Where do we insert d? 3 already filled Probe sequence using linear hashing: h1(d) = (h(d)+1)%8 = 4%8 = 4 h2(d) = (h(d)+2)%8 = 5%8 = 5* h3(d) = (h(d)+3)%8 = 6%8 = 6 etc. 7, 0, 1, 2 Wraps around the beginning of the table! d ©RohitBirlaDataStructure RevisionTutorial 445 15-Oct-2011
  • 444. Operations Using Linear Hashing • Test for membership: findItem • Examine h(k), h1(k), h2(k), …, until we find k or an empty bucket or home bucket • If no deletions possible, strategy works! • What if deletions? • If we reach empty bucket, cannot be sure that k is not somewhere else and empty bucket was occupied when k was inserted • Need special placeholder deleted, to distinguish bucket that was never used from one that once held a value • May need to reorganize table after many deletions ©RohitBirlaDataStructure RevisionTutorial 446 15-Oct-2011
  • 445. Performance Analysis - Worst Case • Initialization: O(b), b# of buckets • Insert and search: O(n), n number of elements in table; all n key values have same home bucket • No better than linear list for maintaining dictionary! ©RohitBirlaDataStructure RevisionTutorial 447 15-Oct-2011
  • 446. Performance Analysis - Avg Case • Distinguish between successful and unsuccessful searches • Delete = successful search for record to be deleted • Insert = unsuccessful search along its probe sequence • Expected cost of hashing is a function of how full the table is: load factor  = n/b • It has been shown that average costs under linear hashing (probing) are: • Insertion: 1/2(1 + 1/(1 - )2) • Deletion: 1/2(1 + 1/(1 - )) ©RohitBirlaDataStructure RevisionTutorial 448 15-Oct-2011
  • 447. Improved Collision Resolution • Linear probing: hi(x) = (h(x) + i) % D • all buckets in table will be candidates for inserting a new record before the probe sequence returns to home position • clustering of records, leads to long probing sequences • Linear probing with skipping: hi(x) = (h(x) + ic) % D • c constant other than 1 • records with adjacent home buckets will not follow same probe sequence • (Pseudo)Random probing: hi(x) = (h(x) + ri) % D • ri is the ith value in a random permutation of numbers from 1 to D-1 • insertions and searches use the same sequence of “random” numbers ©RohitBirlaDataStructure RevisionTutorial 449 15-Oct-2011
  • 448. Example 0 1 2 3 4 5 6 7 8 9 10 1001 9537 3016 9874 2009 9875 h(k) = k%11 0 1 2 3 4 5 6 7 8 9 10 1001 9537 3016 9874 2009 9875 1. What if next element has home bucket 0?  go to bucket 3 Same for elements with home bucket 1 or 2! Only a record with home position 3 will stay.  p = 4/11 that next record will go to bucket 3 2. Similarly, records hashing to 7,8,9 will end up in 10 3. Only records hashing to 4 will end up in 4 (p=1/11); same for 5 and 6 I II insert 1052 (h.b. 7) 1052 next element in bucket 3 with p = 8/11 ©RohitBirlaDataStructure RevisionTutorial 450 15-Oct-2011
  • 449. Hash Functions - Numerical Values • Consider: h(x) = x%16 • poor distribution, not very random • depends solely on least significant four bits of key • Better, mid-square method • if keys are integers in range 0,1,…,K , pick integer C such that DC2 about equal to K2, then h(x) = x2/C % D extracts middle r bits of x2, where 2 r =D (a base-D digit) • better, because most or all of bits of key contribute to result ©RohitBirlaDataStructure RevisionTutorial 451 15-Oct-2011
  • 450. Hash Function – Strings of Characters • Folding Method: int h(String x, int D) { int i, sum; for (sum=0, i=0; i<x.length(); i++) sum+= (int)x.charAt(i); return (sum%D); } • sums the ASCII values of the letters in the string • ASCII value for “A” =65; sum will be in range 650-900 for 10 upper-case letters; good when D around 100, for example • order of chars in string has no effect ©RohitBirlaDataStructure RevisionTutorial 452 15-Oct-2011
  • 451. Hash Function – Strings of Characters • Much better: Cyclic Shift static long hashCode(String key, int D) { int h=0; for (int i=0, i<key.length(); i++){ h = (h << 4) | ( h >> 27); h += (int) key.charAt(i); } return h%D; } ©RohitBirlaDataStructure RevisionTutorial 453 15-Oct-2011
  • 452. Open Hashing • Each bucket in the hash table is the head of a linked list • All elements that hash to a particular bucket are placed on that bucket’s linked list • Records within a bucket can be ordered in several ways • by order of insertion, by key value order, or by frequency of access order ©RohitBirlaDataStructure RevisionTutorial 454 15-Oct-2011
  • 453. Open Hashing Data Organization 0 1 2 3 4 D-1 ... ... ... ©RohitBirlaDataStructure RevisionTutorial 455 15-Oct-2011
  • 454. Analysis • Open hashing is most appropriate when the hash table is kept in main memory, implemented with a standard in- memory linked list • We hope that number of elements per bucket roughly equal in size, so that the lists will be short • If there are n elements in set, then each bucket will have roughly n/D • If we can estimate n and choose D to be roughly as large, then the average bucket will have only one or two members ©RohitBirlaDataStructure RevisionTutorial 456 15-Oct-2011
  • 455. Analysis Cont’d Average time per dictionary operation: • D buckets, n elements in dictionary  average n/D elements per bucket • insert, search, remove operation take O(1+n/D) time each • If we can choose D to be about n, constant time • Assuming each element is likely to be hashed to any bucket, running time constant, independent of n ©RohitBirlaDataStructure RevisionTutorial 457 15-Oct-2011
  • 456. Comparison with Closed Hashing • Worst case performance is O(n) for both • Number of operations for hashing • 23 6 8 10 23 5 12 4 9 19 • D=9 • h(x) = x % D ©RohitBirlaDataStructure RevisionTutorial 458 15-Oct-2011
  • 457. Hashing Problem • Draw the 11 entry hashtable for hashing the keys 12, 44, 13, 88, 23, 94, 11, 39, 20 using the function (2i+5) mod 11, closed hashing, linear probing • Pseudo-code for listing all identifiers in a hashtable in lexicographic order, using open hashing, the hash function h(x) = first character of x. What is the running time? ©RohitBirlaDataStructure RevisionTutorial 459 15-Oct-2011
  • 458. Sorting • Given a set (container) of n elements • E.g. array, set of words, etc. • Suppose there is an order relation that can be set across the elements • Goal Arrange the elements in ascending order • Start  1 23 2 56 9 8 10 100 • End  1 2 8 9 10 23 56 100 ©RohitBirlaDataStructure RevisionTutorial 460 15-Oct-2011
  • 459. Bubble Sort • Simplest sorting algorithm • Idea: • 1. Set flag = false • 2. Traverse the array and compare pairs of two elements • 1.1 If E1  E2 - OK • 1.2 If E1 > E2 then Switch(E1, E2) and set flag = true • 3. If flag = true goto 1. • What happens? ©RohitBirlaDataStructure RevisionTutorial 461 15-Oct-2011
  • 460. Bubble Sort 1 1 23 2 56 9 8 10 100 2 1 2 23 56 9 8 10 100 3 1 2 23 9 56 8 10 100 4 1 2 23 9 8 56 10 100 5 1 2 23 9 8 10 56 100 ---- finish the first traversal ---- ---- start again ---- 1 1 2 23 9 8 10 56 100 2 1 2 9 23 8 10 56 100 3 1 2 9 8 23 10 56 100 4 1 2 9 8 10 23 56 100 ---- finish the second traversal ---- ---- start again ---- …………………. Why Bubble Sort ? ©RohitBirlaDataStructure RevisionTutorial 462 15-Oct-2011
  • 461. Implement Bubble Sort with an Array void bubbleSort (Array S, length n) { boolean isSorted = false; while(!isSorted) { isSorted = true; for(i = 0; i<n; i++) { if(S[i] > S[i+1]) { int aux = S[i]; S[i] = S[i+1]; S[i+1] = aux; isSorted = false; } } } ©RohitBirlaDataStructure RevisionTutorial 463 15-Oct-2011
  • 462. Running Time for Bubble Sort • One traversal = move the maximum element at the end • Traversal #i : n – i + 1 operations • Running time: (n – 1) + (n – 2) + … + 1 = (n – 1) n / 2 = O(n 2) • When does the worst case occur ? • Best case ? ©RohitBirlaDataStructure RevisionTutorial 464 15-Oct-2011
  • 463. Sorting Algorithms Using Priority Queues • Remember Priority Queues = queue where the dequeue operation always removes the element with the smallest key  removeMin • Selection Sort • insert elements in a priority queue implemented with an unsorted sequence • remove them one by one to create the sorted sequence • Insertion Sort • insert elements in a priority queue implemented with a sorted sequence • remove them one by one to create the sorted sequence ©RohitBirlaDataStructure RevisionTutorial 465 15-Oct-2011
  • 464. Selection Sort • insertion: O(1 + 1 + … + 1) = O(n) • selection: O(n + (n-1) + (n-2) + … + 1) = O(n2) ©RohitBirlaDataStructure RevisionTutorial 466 15-Oct-2011
  • 465. Insertion Sort • insertion: O(1 + 2 + … + n) = O(n2) • selection: O(1 + 1 + … + 1) = O(n) ©RohitBirlaDataStructure RevisionTutorial 467 15-Oct-2011
  • 466. Sorting with Binary Trees • Using heaps (see lecture on heaps) • How to sort using a minHeap ? • Using binary search trees (see lecture on BST) • How to sort using BST? ©RohitBirlaDataStructure RevisionTutorial 468 15-Oct-2011
  • 467. Heap Sorting • Step 1: Build a heap • Step 2: removeMin( ) ©RohitBirlaDataStructure RevisionTutorial 469 15-Oct-2011
  • 468. Recall: Building a Heap • build (n + 1)/2 trivial one-element heaps • build three-element heaps on top of them ©RohitBirlaDataStructure RevisionTutorial 470 15-Oct-2011
  • 469. Recall: Heap Removal • Remove element from priority queues? removeMin( ) ©RohitBirlaDataStructure RevisionTutorial 471 15-Oct-2011
  • 470. Recall: Heap Removal • Begin downheap ©RohitBirlaDataStructure RevisionTutorial 472 15-Oct-2011
  • 471. Sorting with BST • Use binary search trees for sorting • Start with unsorted sequence • Insert all elements in a BST • Traverse the tree…. how ? • Running time? ©RohitBirlaDataStructure RevisionTutorial 473 15-Oct-2011
  • 472. Next • Sorting algorithms that rely on the “DIVIDE AND CONQUER” paradigm • One of the most widely used paradigms • Divide a problem into smaller sub problems, solve the sub problems, and combine the solutions • Learned from real life ways of solving problems ©RohitBirlaDataStructure RevisionTutorial 474 15-Oct-2011
  • 473. Divide-and-Conquer • Divide and Conquer is a method of algorithm design that has created such efficient algorithms as Merge Sort. • In terms or algorithms, this method has three distinct steps: • Divide: If the input size is too large to deal with in a straightforward manner, divide the data into two or more disjoint subsets. • Recur: Use divide and conquer to solve the subproblems associated with the data subsets. • Conquer: Take the solutions to the subproblems and “merge” these solutions into a solution for the original problem. ©Rohit Birla Data Structure Revision Tutorial 47515-Oct-2011
  • 474. Merge-Sort • Algorithm: • Divide: If S has at leas two elements (nothing needs to be done if S has zero or one elements), remove all the elements from S and put them into two sequences, S1 and S2, each containing about half of the elements of S. (i.e. S1 contains the first n/2 elements and S2 contains the remaining n/2 elements. • Recur: Recursive sort sequences S1 and S2. • Conquer: Put back the elements into S by merging the sorted sequences S1 and S2 into a unique sorted sequence. • Merge Sort Tree: • Take a binary tree T • Each node of T represents a recursive call of the merge sort algorithm. • We associate with each node v of T a the set of input passed to the invocation v represents. • The external nodes are associated with individual elements of S, upon which no recursion is called. ©Rohit Birla Data Structure Revision Tutorial 47615-Oct-2011
  • 475. Merge-Sort ©Rohit Birla Data Structure Revision Tutorial 47715-Oct-2011
  • 476. Merge-Sort(cont.) ©Rohit Birla Data Structure Revision Tutorial 47815-Oct-2011
  • 477. Merge-Sort (cont’d) ©Rohit Birla Data Structure Revision Tutorial 47915-Oct-2011
  • 478. Merging Two Sequences ©Rohit Birla Data Structure Revision Tutorial 48015-Oct-2011
  • 479. Quick-Sort • Another divide-and-conquer sorting algorihm • To understand quick-sort, let’s look at a high-level description of the algorithm 1) Divide : If the sequence S has 2 or more elements, select an element x from S to be your pivot. Any arbitrary element, like the last, will do. Remove all the elements of S and divide them into 3 sequences: L, holds S’s elements less than x E, holds S’s elements equal to x G, holds S’s elements greater than x 2) Recurse: Recursively sort L and G 3) Conquer: Finally, to put elements back into S in order, first inserts the elements of L, then those of E, and those of G. Here are some diagrams.... ©Rohit Birla Data Structure Revision Tutorial 48115-Oct-2011
  • 480. Idea of Quick Sort 1) Select: pick an element 2) Divide: rearrange elements so that x goes to its final position E 3) Recurse and Conquer: recursively sort ©Rohit Birla Data Structure Revision Tutorial 48215-Oct-2011
  • 481. Quick-Sort Tree ©Rohit Birla Data Structure Revision Tutorial 48315-Oct-2011
  • 482. In-Place Quick-Sort Divide step: l scans the sequence from the left, and r from the right. A swap is performed when l is at an element larger than the pivot and r is at one smaller than the pivot. ©Rohit Birla Data Structure Revision Tutorial 48415-Oct-2011
  • 483. In Place Quick Sort (cont’d) A final swap with the pivot completes the divide step ©Rohit Birla Data Structure Revision Tutorial 48515-Oct-2011
  • 484. Running time analysis • Average case analysis • Worst case analysis • What is the worst case for quick-sort? • Running time? ©RohitBirlaDataStructure RevisionTutorial 486 15-Oct-2011
  • 485. Today… • Quick Review • Divide and Conquer paradigm • Merge Sort • Quick Sort • Two more sorting algorithms • Bucket Sort • Radix Sort ©RohitBirlaDataStructure RevisionTutorial 487 15-Oct-2011
  • 486. Divide-and-Conquer • Divide and Conquer is a method of algorithm design. • This method has three distinct steps: • Divide: If the input size is too large to deal with in a straightforward manner, divide the data into two or more disjoint subsets. • Recur: Use divide and conquer to solve the subproblems associated with the data subsets. • Conquer: Take the solutions to the subproblems and “merge” these solutions into a solution for the original problem. ©Rohit Birla Data Structure Revision Tutorial 48815-Oct-2011
  • 487. Merge-Sort • Algorithm: • Divide: If S has at leas two elements (nothing needs to be done if S has zero or one elements), remove all the elements from S and put them into two sequences, S1 and S2, each containing about half of the elements of S. (i.e. S1 contains the first n/2 elements and S2 contains the remaining n/2 elements. • Recur: Recursive sort sequences S1 and S2. • Conquer: Put back the elements into S by merging the sorted sequences S1 and S2 into a unique sorted sequence. ©Rohit Birla Data Structure Revision Tutorial 48915-Oct-2011
  • 488. Merge-Sort Example ©Rohit Birla Data Structure Revision Tutorial 49015-Oct-2011
  • 489. Running Time of Merge-Sort • At each level in the binary tree created for Merge Sort, there are n elements, with O(1) time spent at each element •  O(n) running time for processing one level • The height of the tree is O(log n) • Therefore, the time complexity is O(nlog n) ©Rohit Birla Data Structure Revision Tutorial 49115-Oct-2011
  • 490. Quick-Sort 1) Divide : If the sequence S has 2 or more elements, select an element x from S to be your pivot. Any arbitrary element, like the last, will do. Remove all the elements of S and divide them into 3 sequences: L, holds S’s elements less than x E, holds S’s elements equal to x G, holds S’s elements greater than x 2) Recurse: Recursively sort L and G 3) Conquer: Finally, to put elements back into S in order, first inserts the elements of L, then those of E, and those of G. ©Rohit Birla Data Structure Revision Tutorial 49215-Oct-2011
  • 491. Idea of Quick Sort 1) Select: pick an element 2) Divide: rearrange elements so that x goes to its final position E 3) Recurse and Conquer: recursively sort ©Rohit Birla Data Structure Revision Tutorial 49315-Oct-2011
  • 492. Quick-Sort Tree ©Rohit Birla Data Structure Revision Tutorial 49415-Oct-2011
  • 493. In-Place Quick-Sort Divide step: l scans the sequence from the left, and r from the right. A swap is performed when l is at an element larger than the pivot and r is at one smaller than the pivot. ©Rohit Birla Data Structure Revision Tutorial 49515-Oct-2011
  • 494. In Place Quick Sort (cont’d) A final swap with the pivot completes the divide step ©Rohit Birla Data Structure Revision Tutorial 49615-Oct-2011
  • 495. Quick Sort Running Time • Worst case: when the pivot does not divide the sequence in two • At each step, the length of the sequence is only reduced by 1 • Total running time • General case: • Time spent at level i in the tree is O(n) • Running time: O(n) * O(height) • Average case: • O(n log n) = = 1 2 )()( ni i nOSlength ©Rohit Birla Data Structure Revision Tutorial 49715-Oct-2011
  • 496. More Sorting Algorithms • Bucket Sort • Radix Sort • Stable sort • A sorting algorithm where the order of elements having the same key is not changed in the final sequence • Is bubble sort stable? • Is merge sort stable? ©RohitBirlaDataStructure RevisionTutorial 498 15-Oct-2011
  • 497. Bucket Sort • Bucket sort • Assumption: the keys are in the range [0, N) • Basic idea: 1. Create N linked lists (buckets) to divide interval [0,N) into subintervals of size 1 2. Add each input element to appropriate bucket 3. Concatenate the buckets • Expected total time is O(n + N), with n = size of original sequence • if N is O(n)  sorting algorithm in O(n) ! ©RohitBirlaDataStructure RevisionTutorial 499 15-Oct-2011
  • 498. Bucket Sort Each element of the array is put in one of the N “buckets” ©RohitBirlaDataStructure RevisionTutorial 500 15-Oct-2011
  • 499. Bucket Sort Now, pull the elements from the buckets into the array At last, the sorted array (sorted in a stable way): ©RohitBirlaDataStructure RevisionTutorial 501 15-Oct-2011
  • 500. Does it Work for Real Numbers? • What if keys are not integers? • Assumption: input is n reals from [0, 1) • Basic idea: • Create N linked lists (buckets) to divide interval [0,1) into subintervals of size 1/N • Add each input element to appropriate bucket and sort buckets with insertion sort • Uniform input distribution  O(1) bucket size • Therefore the expected total time is O(n) • Distribution of keys in buckets similar with …. ? ©RohitBirlaDataStructure RevisionTutorial 502 15-Oct-2011
  • 501. Radix Sort • How did IBM get rich originally? • Answer: punched card readers for census tabulation in early 1900’s. • In particular, a card sorter that could sort cards into different bins • Each column can be punched in 12 places • (Decimal digits use only 10 places!) • Problem: only one column can be sorted on at a time ©RohitBirlaDataStructure RevisionTutorial 503 15-Oct-2011
  • 502. Radix Sort • Intuitively, you might sort on the most significant digit, then the second most significant, etc. • Problem: lots of intermediate piles of cards to keep track of • Key idea: sort the least significant digit first RadixSort(A, d) for i=1 to d StableSort(A) on digit i ©RohitBirlaDataStructure RevisionTutorial 504 15-Oct-2011
  • 503. Radix Sort • Can we prove it will work? • Inductive argument: • Assume lower-order digits {j: j<i}are sorted • Show that sorting next digit i leaves array correctly sorted • If two digits at position i are different, ordering numbers by that digit is correct (lower-order digits irrelevant) • If they are the same, numbers are already sorted on the lower-order digits. Since we use a stable sort, the numbers stay in the right order ©RohitBirlaDataStructure RevisionTutorial 505 15-Oct-2011
  • 504. Radix Sort • What sort will we use to sort on digits? • Bucket sort is a good choice: • Sort n numbers on digits that range from 1..N • Time: O(n + N) • Each pass over n numbers with d digits takes time O(n+k), so total time O(dn+dk) • When d is constant and k=O(n), takes O(n) time ©RohitBirlaDataStructure RevisionTutorial 506 15-Oct-2011
  • 505. Radix Sort Example • Problem: sort 1 million 64-bit numbers • Treat as four-digit radix 216 numbers • Can sort in just four passes with radix sort! • Running time: 4( 1 million + 216 )  4 million operations • Compare with typical O(n lg n) comparison sort • Requires approx lg n = 20 operations per number being sorted • Total running time  20 million operations ©RohitBirlaDataStructure RevisionTutorial 507 15-Oct-2011
  • 506. Radix Sort • In general, radix sort based on bucket sort is • Asymptotically fast (i.e., O(n)) • Simple to code • A good choice • Can radix sort be used on floating-point numbers? ©RohitBirlaDataStructure RevisionTutorial 508 15-Oct-2011
  • 507. Summary: Radix Sort • Radix sort: • Assumption: input has d digits ranging from 0 to k • Basic idea: • Sort elements by digit starting with least significant • Use a stable sort (like bucket sort) for each stage • Each pass over n numbers with 1 digit takes time O(n+k), so total time O(dn+dk) • When d is constant and k=O(n), takes O(n) time • Fast, Stable, Simple • Doesn’t sort in place ©RohitBirlaDataStructure RevisionTutorial 509 15-Oct-2011
  • 508. Sorting Algorithms: Running Time • Assuming an input sequence of length n • Bubble sort • Insertion sort • Selection sort • Heap sort • Merge sort • Quick sort • Bucket sort • Radix sort ©RohitBirlaDataStructure RevisionTutorial 510 15-Oct-2011
  • 509. Sorting Algorithms: In-Place Sorting • A sorting algorithm is said to be in-place if • it uses no auxiliary data structures (however, O(1) auxiliary variables are allowed) • it updates the input sequence only by means of operations replaceElement and swapElements • Which sorting algorithms seen so far can be made to work in place? ©RohitBirlaDataStructure RevisionTutorial 511 15-Oct-2011
  • 510. Golden Rule of File Processing Minimize the number of disk accesses! 1. Arrange information so that you get what you want with few disk accesses. 2. Arrange information to minimize future disk accesses. An organization for data on disk is often called a file structure. Disk-based space/time tradeoff: Compress information to save processing time by reducing disk accesses. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 512
  • 511. Disk Drives 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 513
  • 512. Sectors A sector is the basic unit of I/O. Interleaving factor: Physical distance between logically adjacent sectors on a track. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 514
  • 513. Terms Locality of Reference: When record is read from disk, next request is likely to come from near the same place in the file. Cluster: Smallest unit of file allocation, usually several sectors. Extent: A group of physically contiguous clusters. Internal fragmentation: Wasted space within sector if record size does not match sector size; wasted space within cluster if file size is not a multiple of cluster size. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 515
  • 514. Seek Time Seek time: Time for I/O head to reach desired track. Largely determined by distance between I/O head and desired track. Track-to-track time: Minimum time to move from one track to an adjacent track. Average Seek time: Average time to reach a track for random access. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 516
  • 515. Other Factors Rotational Delay or Latency: Time for data to rotate under I/O head. • One half of a rotation on average. • At 7200 rpm, this is 8.3/2 = 4.2ms. Transfer time: Time for data to move under the I/O head. • At 7200 rpm: Number of sectors read/Number of sectors per track * 8.3ms. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 517
  • 516. Disk Spec Example 16.8 GB disk on 10 platters = 1.68GB/platter 13,085 tracks/platter 256 sectors/track 512 bytes/sector Track-to-track seek time: 2.2 ms Average seek time: 9.5ms 4KB clusters, 32 clusters/track. Interleaving factor of 3. 5400RPM 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 518
  • 517. Disk Access Cost Example (1) Read a 1MB file divided into 2048 records of 512 bytes (1 sector) each. Assume all records are on 8 contiguous tracks. First track: 9.5 + 11.1/2 + 3 x 11.1 = 48.4 ms Remaining 7 tracks: 2.2 + 11.1/2 + 3 x 11.1 = 41.1 ms. Total: 48.4 + 7 * 41.1 = 335.7ms 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 519
  • 518. Disk Access Cost Example (2) Read a 1MB file divided into 2048 records of 512 bytes (1 sector) each. Assume all file clusters are randomly spread across the disk. 256 clusters. Cluster read time is (3 x 8)/256 of a rotation for about 1 ms. 256(9.5 + 11.1/2 + (3 x 8)/256) is about 3877 ms. or nearly 4 seconds. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 520
  • 519. How Much to Read? Read time for one track: 9.5 + 11.1/2 + 3 x 11.1 = 48.4ms. Read time for one sector: 9.5 + 11.1/2 + (1/256)11.1 = 15.1ms. Read time for one byte: 9.5 + 11.1/2 = 15.05 ms. Nearly all disk drives read/write one sector at every I/O access. • Also referred to as a page. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 521
  • 520. Buffers The information in a sector is stored in a buffer or cache. If the next I/O access is to the same buffer, then no need to go to disk. There are usually one or more input buffers and one or more output buffers. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 522
  • 521. Buffer Pools A series of buffers used by an application to cache disk data is called a buffer pool. Virtual memory uses a buffer pool to imitate greater RAM memory by actually storing information on disk and “swapping” between disk and RAM. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 523
  • 522. Buffer Pools 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 524
  • 523. Organizing Buffer Pools Which buffer should be replaced when new data must be read? First-in, First-out: Use the first one on the queue. Least Frequently Used (LFU): Count buffer accesses, reuse the least used. Least Recently used (LRU): Keep buffers on a linked list. When buffer is accessed, bring it to front. Reuse the one at end. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 525
  • 524. Design Issues Disadvantage of message passing: • Messages are copied and passed back and forth. Disadvantages of buffer passing: • The user is given access to system memory (the buffer itself) • The user must explicitly tell the buffer pool when buffer contents have been modified, so that modified data can be rewritten to disk when the buffer is flushed. • The pointer might become stale when the bufferpool replaces the contents of a buffer. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 526
  • 525. Programmer’s View of Files Logical view of files: • An a array of bytes. • A file pointer marks the current position. Three fundamental operations: • Read bytes from current position (move file pointer) • Write bytes to current position (move file pointer) • Set file pointer to specified byte position. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 527
  • 526. External Sorting Problem: Sorting data sets too large to fit into main memory. • Assume data are stored on disk drive. To sort, portions of the data must be brought into main memory, processed, and returned to disk. An external sort should minimize disk accesses. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 528
  • 527. Model of External Computation Secondary memory is divided into equal-sized blocks (512, 1024, etc…) A basic I/O operation transfers the contents of one disk block to/from main memory. Under certain circumstances, reading blocks of a file in sequential order is more efficient. (When?) Primary goal is to minimize I/O operations. Assume only one disk drive is available. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 529
  • 528. Key Sorting Often, records are large, keys are small. • Ex: Payroll entries keyed on ID number Approach 1: Read in entire records, sort them, then write them out again. Approach 2: Read only the key values, store with each key the location on disk of its associated record. After keys are sorted the records can be read and rewritten in sorted order. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 530
  • 529. Simple External Mergesort (1) Quicksort requires random access to the entire set of records. Better: Modified Mergesort algorithm. • Process n elements in (log n) passes. A group of sorted records is called a run. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 531
  • 530. Simple External Mergesort (2) • Split the file into two files. • Read in a block from each file. • Take first record from each block, output them in sorted order. • Take next record from each block, output them to a second file in sorted order. • Repeat until finished, alternating between output files. Read new input blocks as needed. • Repeat steps 2-5, except this time input files have runs of two sorted records that are merged together. • Each pass through the files provides larger runs. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 532
  • 531. Simple External Mergesort (3) 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 533
  • 532. Problems with Simple Mergesort Is each pass through input and output files sequential? What happens if all work is done on a single disk drive? How can we reduce the number of Mergesort passes? In general, external sorting consists of two phases: • Break the files into initial runs • Merge the runs together into a single run. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 534
  • 533. Breaking a File into Runs General approach: • Read as much of the file into memory as possible. • Perform an in-memory sort. • Output this group of records as a single run. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 535
  • 534. Replacement Selection (1) • Break available memory into an array for the heap, an input buffer, and an output buffer. • Fill the array from disk. • Make a min-heap. • Send the smallest value (root) to the output buffer. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 536
  • 535. Replacement Selection (2) • If the next key in the file is greater than the last value output, then • Replace the root with this key else • Replace the root with the last key in the array Add the next record in the file to a new heap (actually, stick it at the end of the array). 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 537
  • 536. RS Example 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 538
  • 537. Snowplow Analogy (1) Imagine a snowplow moving around a circular track on which snow falls at a steady rate. At any instant, there is a certain amount of snow S on the track. Some falling snow comes in front of the plow, some behind. During the next revolution of the plow, all of this is removed, plus 1/2 of what falls during that revolution. Thus, the plow removes 2S amount of snow. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 539
  • 538. Snowplow Analogy (2) 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 540
  • 539. Problems with Simple Merge Simple mergesort: Place runs into two files. • Merge the first two runs to output file, then next two runs, etc. Repeat process until only one run remains. • How many passes for r initial runs? Is there benefit from sequential reading? Is working memory well used? Need a way to reduce the number of passes. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 541
  • 540. Multiway Merge (1) With replacement selection, each initial run is several blocks long. Assume each run is placed in separate file. Read the first block from each file into memory and perform an r- way merge. When a buffer becomes empty, read a block from the appropriate run file. Each record is read only once from disk during the merge process. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 542
  • 541. Multiway Merge (2) In practice, use only one file and seek to appropriate block. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 543
  • 542. Limits to Multiway Merge (1) Assume working memory is b blocks in size. How many runs can be processed at one time? The runs are 2b blocks long (on average). How big a file can be merged in one pass? 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 544
  • 543. Limits to Multiway Merge (2) Larger files will need more passes -- but the run size grows quickly! This approach trades (log b) (possibly) sequential passes for a single or very few random (block) access passes. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 545
  • 544. General Principles A good external sorting algorithm will seek to do the following: • Make the initial runs as long as possible. • At all stages, overlap input, processing and output as much as possible. • Use as much working memory as possible. Applying more memory usually speeds processing. • If possible, use additional disk drives for more overlapping of processing with I/O, and allow for more sequential file processing. 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 546
  • 545. The End 15-Oct-2011 ©RohitBirlaDataStructure RevisionTutorial 547 All the information and contents are taken from internet. Hence author will not be responsible. This Tutorial is being provided “Free of cost” and “As-is”.