Algorithm Design and Complexity - Course 1&2

28,564
-1

Published on

Courses 1 & 2 for the Algorithm Design and Complexity course at the Faculty of Engineering in Foreign Languages - Politehnica University of Bucharest, Romania

Published in: Education, Technology

Algorithm Design and Complexity - Course 1&2

  1. 1. Algorithm Design and Complexity Course 1 & 2
  2. 2. Introduction  Algorithms and problems  We need to be able to provide solutions to problems    Any domain has problems that require an algorithmic solution Find the best solution from a wide range of choices Learn methods to develop solutions  Problem => Idea => Solution => Algorithm => Pseudocode => Code => Compiled Program  We need to design algorithms
  3. 3. Introduction (2)  Any non-trivial problem accepts a wide range of solutions  Need to compare these solution in order to find the best one => Complexity  Need to show that the devised solution solves the problem => Correctness  Not all the problems, have an algorithmic solution!
  4. 4. Introduction (3)  Some problems are similar (or slightly variations)    Accept similar solutions Need to learn to discover two problems that are similar Some methods used for designing algorithms provide solutions for different problems   Need to understand these methods in order to know to use them for as many problems as possible The problems that can be solved using one method have some common properties!
  5. 5. Course Info  Lectures: Traian Rebedea           Ph.D. @ University Politehnica of Bucharest, CS Dept. traian.rebedea@cs.pub.ro / trebedea@gmail.com Lecturer @ Computer Science Department Interests: NLP, IR/IE, ML, TEL/CSCL, AI in general Published over 25 papers at important conferences Published 4 book chapters (2 in important international books) Worked at 3 companies, founded 1 company http://www.informatik.uni-trier.de/~ley/db/indices/atree/r/Rebedea:Traian.html http://ro.linkedin.com/in/trebedea Course Website: http://adcfils.wordpress.com/
  6. 6. Course Overview  Introduction      Complexity of Algorithms Correctness of Algorithms (short) Algorithm Design Techniques Graph Algorithms & Applications   Problems and Decidability Searches, topological sort, articulation points, bridges, strongly connected components, minimum spanning trees, shortest paths, network flow Classes of Problems
  7. 7. Grading   Exam: 40p Lab & Assignments: 60p   Assignments: 45p (3 x 15p) Lab activity: 15p   The lab assistant decides how to grade the lab activity Rules:     Minimum 30p for lab & assignments Minimum 15p for the exam Minimum 50p for total You are not allowed to copy solutions from colleagues or WWW (Measure of Software Similarity - MOSS)
  8. 8. Textbooks & More Info   Cormen T.H, Leiserson C.E, Rivest R.L and Stein C, Introduction to Algorithms, Second Edition. MIT Press, 2001 Baase S and A. Van Gelder. Computer Algorithms. Introduction to Design & Analysis, Addison-Wesley, 2000  References for each chapter  Introduction to Algorithms @ MIT   Coursera: Algorithms: Design and Analysis (part 1 and part 2)   Click here for link  also has video lectures https://www.coursera.org/course/algo, https://www.coursera.org/course/algo2 Websites for programming exercises: TopCoder, Infoarena, Talent Buddy, HackerRank, Project Euler
  9. 9. Problems and Algorithms  Problems 1 – n Algorithms    Problem: Sorting    1+ algorithms to solve each problem An algorithm usually solves only 1 problem Given an array with n numbers A[n], arrange the elements in the array such that any two consecutive elements are sorted (A[i] <= A[i + 1] for i = 1..n-1) Arrays A[1..n] Algorithms:  Quick Sort, Merge Sort, Heap Sort, Bubble Sort, Insertion Sort, Selection Sort, Radix Sort, Bucket Sort, …
  10. 10. Problems & Computability  There are a lot of problems  We would like to find solutions for all of them  Not all the problems can be solved!  The problems that can be solved are called computable or decidable problems  The problems that cannot be solved are:   Very difficult Not clear enough (need for subjective reasoning)
  11. 11. Problems & Computability  We would all like to know:  Which stock bonds shall rise tomorrow? Which football team would win a game? Who shall I marry? Is there a God? Are there any aliens in the universe?  Who is the most beautiful girl in the world?      Need for subjective thinking: what does “the most beautiful” mean?
  12. 12. Example  This example should be understood as a metaphor   Problem: Is there any alien life in the universe?   The physics and astronomy parts of it may be wrong Assumption: the universe is infinite and there are an infinite number of celestial bodies Solution:    Explore all the planets, suns and other celestial bodies Use any exploring method, it can be as good as you want Explore the celestial body with a perfect scanner   If you find life on it => ANSWER:YES! Alien life exists Else: continue to the next celestial body
  13. 13. Example (2)  The previous solution has a flaw     It never answers NO! If the answer to our problem would be NO, then we must wait an infinite amount of time We cannot stop the solution at any moment in time and conclude for certain that the answer is NO, because maybe we still have a celestial body with alien life on it that has not been explored yet! This kind of problem is called undecidable!
  14. 14. Undecidable Problems  Problems that cannot be solved with an algorithmic solution   We can devise an algorithm, but that algorithm shall never finish in some situations… therefore we cannot know the answer to our problem! Quick info:  A decision problem = a problem for which the answer is {yes, no}   Is n a prime number ? An optimum problem = a problem for each we need to find the optimum solution out of a set of possible solutions   Which is the shortest path between two vertices in a graph ? Optimum = minimum or maximum
  15. 15. Undecidable Problems (2)  Any decision problem can be:   Decidable – can always solve the problem with an algorithm that always finishes! Semi-decidable – can devise a pseudo-algorithm for solving the problem that only finishes if the answer is YES, but it never finishes otherwise!     Therefore, we can never know whether the answer is YES or NO, but if the algorithm stops than the answer is for sure YES Undecidable – can not know if the answer is YES… The previous example is a semi-decidable problem However, most of the times all the problems that are not decidable (semi- or un-decidable) are called undecidable
  16. 16. Undecidable Problems (3)  Why the example is a metaphor?  Because, any problem that is not decidable should have an infinite space of the problem!  All the problems that have a finite number of states that form the space of exploration for that problem are decidable   E.g. there are a finite number of arranging the numbers in an array E.g. there are a finite number of ways to arrange a set of queens on a chess board
  17. 17. Undecidable Problems (4)  Very difficult problems have an infinite space that should be explored in order to find the solution!  Quick info:    There are an infinitely uncountable number of problems in this world There are only an infinitely countable number of programs in this world Alonso-Church thesis states that all the problems that can be computed (are decidable) are the ones that can have a program associated to them, that is used for solving them   Only a infinitely countable set of problems are decidable The rest are not decidable
  18. 18. Classic Problems that are NOT Decidable  Halting Problem:  Given a program P and an input x, does P(x) halts/finishes?   YES if P(x) finishes in a finite amount of time NO if P(x) never finishes (may be because it loops forever)  Barber Problem/Paradox  Post’s Correspondence Problem
  19. 19. Barber Paradox    http://en.wikipedia.org/wiki/Barber_paradox Suppose there is a town with just one male barber; and that every man in the town keeps himself clean-shaven: some by shaving themselves, some by attending the barber. It seems reasonable to imagine that the barber obeys the following rule: He shaves all and only those men in town who do not shave themselves. Does the barber shave himself?
  20. 20. Barber Paradox (2)  1. 2. The situation presented is in fact impossible: If the barber does not shave himself, he must abide by the rule and shave himself. If he does shave himself, according to the rule he will not shave himself
  21. 21. Post’s correspondence problem   http://en.wikipedia.org/wiki/Post_correspondence_problem The input of the problem consists of two finite lists: of words over some alphabet A having at least two symbols.  A solution to this problem is a sequence of indices with K >=1 and for all k, such that  The decision problem then is to decide whether such a solution exists or not.
  22. 22. Complexity of Algorithms  Need to find the best algorithm for solving a problem Is algorithm A better than algorithm B ?  A measure for the performance of an algorithm  Simple practical solution:     Implement the algorithms Measure their running times on a given machine But we want to measure the performance of an algorithm:   Independent of the machine and language it is implemented in Without wasting time for implementing it
  23. 23. Complexity of Algorithms (2)  We need a theoretical framework for measuring the performance of an algorithm  Performance    Time = How quick does the algorithm compute the results ? Space = How much memory does it need ? Focus on time performance   Moore’s Law: processing power evolves less quickly than storage capacity Space constraints are rarely an issue: related to RAM size
  24. 24. Running Time  Measure of the time complexity of an algorithm  It is a theoretical measure that is dependent of the input data and the processing performed by an algorithm  We define the running time as a function that only depends on the size of the input data     The size of the input data is measured by positive integers For arrays: A[n] For graphs: G(V, E), |V| = n, |E| = m For multiplying 3 matrices: lines1, columns1, columns2
  25. 25. Running Time (2)  We shall only discuss in this chapter running times that are dependent of a single parameter: T(n)  The discussion can be easily extended to more parameters  T(n) is the running time for an algorithm that has an input data of size n  T(n) is a function  T(n): N → R+
  26. 26. Example – Insertion Sort     http://en.wikipedia.org/wiki/Insertion_sort Problem: Sorting an array A[n] Solution: Insertion sort Every repetition of the main loop of insertion sort removes an element from the input data, inserting it into the correct position in the already-sorted list, until no input elements remain.   The already-sorted list is the sub-array on the left side Usually, the removed element is the next one
  27. 27. Insertion Sort - Pseudocode InsertionSort( A[1..n] ) 1. FOR (j = 2 .. n) 2. x = A[j] 3. i=j–1 4. 5. 6. 7. 8. // element to be inserted // position on the right side of // the sorted sub-array WHILE (i > 0 AND x < A[i]) // while not in position A[i + 1] = A[i] // move to right i-// continue A[i + 1] = x RETURN A
  28. 28. Example  From Erik D. Demaine and Charles E. Leiserson – Introduction to Algorithms@ MIT
  29. 29. Analysis of Complexity  What is the complexity of Insertion Sort ?  General solution for the running time:       Each simple instruction takes a constant amount of time This is clearly a simplification as the execution of an instruction depends on the operands Simple instructions: assignments, logical, mathematical between numbers, print/scan of a number, return, … Complex instructions: calls to other functions T(n) = sum over the running time of each instruction The running time of an instruction = Running time to execute it once * Number of times it is executed
  30. 30. Analysis of Complexity – General Instruction nbr Running time – execute once Number of time it is executed 1 C1 (a constant) n 2 C2 n–1 3 C3 n–1 4 C4 T1 5 C5 T2 6 C6 T2 7 C7 n–1 8 C8 1 T(n) = C1*n + (C2+C3+C7)*(n-1) + C4*T1 + (C5+C6)*T2 T1 = (j=2..n)  tj T2 = (j=2..n)  (tj – 1) Difficult to say more about the general form of the running time
  31. 31. Analysis of Complexity – General (2)  General form of the running time cannot be expanded because it depends on the structure of the input data   T1, T2 = ?? However, there are interesting special cases that can be easily computed:    Worst case Best case Average case ?
  32. 32. Worst Case Complexity     Happens when the array is sorted descending In this case, all the elements x = A[j] are lower than all the previous elements Therefore, they must be moved to the beginning of the array Thus: tj = j (from j – 1 to 0) T1 = (j=2..n)  j = n*(n+1) / 2 – 1 T2 = (j=2..n)  (j - 1) = n*(n-1) / 2 Tworst(n) = a*n2 + b*n + c quadratic time
  33. 33. Best Case Complexity     Happens when the array is sorted ascending In this case, all the elements x = A[j] are higher than all the previous elements Therefore, they are not moved Thus: tj = 1 T1 = (j=2..n)  1 = n-1 T2 = (j=2..n)  (1 - 1) = 0 Tbest(n) = b1*n + c1 linear time
  34. 34. Average Case Complexity    It is interesting to compute it precisely It is very difficult to compute it precisely! Should take into consideration the distribution of the input data and sum up over all possible instances of the input data averaged by their distribution     See example of formula in blackboard Not feasible Simpler solution: on average, an element x = A[j] in inserted in the middle of the already-sorted list Recompute T1 and T2 for this case => still a quadratic solution
  35. 35. Conclusions  General formula for the complexity of an algorithm is usually incomplete due to the influence of the structure of the input data Average case is interesting, but difficult to compute  Solution: only compute worst case!  Makes sense for a lot of applications: ABS braking algorithm should be good on worst case, the same for computing reports for your boss, … On most occasions, average case has the same order of growth as the worst case complexity  
  36. 36. hope you’re not sleeping yet 
  37. 37. Asymptotic Notations  Current simplifications for computing complexity of algorithms:    Constant amount of time for simple instructions Interested in worst case most of the time Only interested in the asymptotic behavior of T(n)  Asymptotic: T(n) | n → INF  These notations are not used only for running times, but for any function of the form f(n) : N → R+  |sin(n)|, 1/n, 2sin(n)+1 are functions that cannot be running times
  38. 38. Big-O Notation   http://en.wikipedia.org/wiki/Big_O_notation Upper asymptotic bound
  39. 39. Big- Notation   Omega Lower asymptotic bound
  40. 40.  Notation   Theta Order of growth
  41. 41. Remarks   It is important to compute the order of growth for algorithms The asymptotic notations define sets of functions  See picture on blackboard  Sometimes, the Big-O notation is used as a substituent of the  notation   notation – equivalence relation for functions of the form f(n) : N → R+ Big-O, Big- notations – partial order relations 
  42. 42. Equivalence Relation  Three important properties: 3. Reflexivity Symmetry Transitivity  It partitions the functions into equivalence classes: 1. 2.    Each class has a representative function Obtained by removing all lower degree terms and removing any constants from the highest degree term (1), (log log n), (log n), (n log n), (n2), (n3), … , (2n), (n!), (nn)
  43. 43. Partial Order Relations  1. 2. 3.     Three important properties: Reflexivity Anti-symmetry Transitivity f(n)  O(g(n)) … f(n) “<=“ g(n) f(n)  (g(n)) … f(n) “>=“ g(n) f(n)  (g(n)) … f(n) “~=“ g(n) Partial order because some functions cannot be compared: e.g. n and nsin(n)+1
  44. 44. Small-o and Small- Notations  f(n)  O(g(n)) can be:   Tight: 2*n2 + n  O(n2) Loose: 5*n  O(n2)  Therefore, small-o is always loose:  Similarly:   f(n)  o(g(n)) … f(n) “<“ g(n) f(n)  (g(n)) … f(n) “>“ g(n)
  45. 45. Asymptotic Notations Used in Equations  For any function on the left side that is part of the set defined by that asymptotic notation  there is a function on the right side that is part of the set defined by that asymptotic notation
  46. 46. Exercises – Set 1 What is the complexity of the following algorithms ?  Matrix_add_1 (A[n][n],B[n][n]) { for (i = 1,n) { for (j = 1,n) { C[i][j] = 0 } } for (i = 1,n) { for (j = 1,n) { C[i][j] = A[i][j] + B[i][j] } } return C }
  47. 47. Matrix_add_2 (A[n][n],B[n][n]) { for (i = 1,n) { for (j = 1,n) { B[i][j] = A[i][j] + B[i][j] } } return B }
  48. 48. Maximum Subsequence Sum Problem  Given (possibly negative) integers a1, a2, ..., an, find the maximum value of (k=i..j)Σak. The maximum subsequence sum is defined to be 0 if all the integers are negative
  49. 49. Let A[1] … A[N] be an array of integers that contains a sequence of length N. Let sum and maxSum be integers initialized to 0. For integer i = 1 to N do Let sum = 0 For integer j = i to N do Let sum = sum + A[ j ] If( sum > maxSum ) then Let maxSum = sum Return maxSum
  50. 50.  There also exist solutions in:    (n) (n log n) http://www.wou.edu/~broegb/Cs345/MaxSubsequenceSum.pdf (n3)
  51. 51. Exercises – Set 2
  52. 52. Keep in Mind  Are there things more important than the performance of an algorithm ?  May be:         Correctness Modularity Maintainability Robustness User-friendliness Programmer time Extensibility Reliability
  53. 53. References  CLRS – Chapters 1-3  MIT OCW – Introduction to Algorithms – video lectures 1-2

×