Algorithms Complexity and
Data Structures Efficiency
Computational Complexity, Choosing Data Structures
Svetlin Nakov
Telerik Corporation
www.telerik.com
Table of Contents
1. Algorithms Complexity and Asymptotic
Notation
 Time and Memory Complexity
 Mean, Average and Worst Case
2. Fundamental Data Structures – Comparison
 Arrays vs. Lists vs.Trees vs. Hash-Tables
3. Choosing Proper Data Structure
2
Why Data Structures are
Important?
 Data structures and algorithms are the
foundation of computer programming
 Algorithmic thinking, problem solving and
data structures are vital for software engineers
 All .NET developers should know when to use
T[], LinkedList<T>, List<T>, Stack<T>,
Queue<T>, Dictionary<K,T>, HashSet<T>,
SortedDictionary<K,T> and SortedSet<T>
 Computational complexity is important for
algorithm design and efficient programming
3
Algorithms Complexity
Asymtotic Notation
Algorithm Analysis
 Why we should analyze algorithms?
 Predict the resources that the algorithm
requires
 Computational time (CPU consumption)
 Memory space (RAM consumption)
 Communication bandwidth consumption
 The running time of an algorithm is:
 The total number of primitive operations
executed (machine independent steps)
 Also known as algorithm complexity
5
Algorithmic Complexity
 What to measure?
 Memory
 Time
 Number of steps
 Number of particular operations
 Number of disk operations
 Number of network packets
 Asymptotic complexity
6
Time Complexity
 Worst-case
 An upper bound on the running time for any
input of given size
 Average-case
 Assume all inputs of a given size are equally
likely
 Best-case
 The lower bound on the running time
7
Time Complexity – Example
 Sequential search in a list of size n
 Worst-case:
 n comparisons
 Best-case:
 1 comparison
 Average-case:
 n/2 comparisons
 The algorithm runs in linear time
 Linear number of operations
… … … … … … …
n
8
Algorithms Complexity
 Algorithm complexity is rough estimation of the
number of steps performed by given computation
depending on the size of the input data
 Measured through asymptotic notation
 O(g) where g is a function of the input data size
 Examples:
 Linear complexity O(n) – all elements are
processed once (or constant number of times)
 Quadratic complexity O(n2) – each of the
elements is processed n times
9
Asymptotic Notation: Definition
 Asymptotic upper bound
 O-notation (Big O notation)
 For given function g(n), we denote by O(g(n))
the set of functions that are different than g(n)
by a constant
 Examples:
 3 * n2 + n/2 + 12 ∈ O(n2)
 4*n*log2(3*n+1) + 2*n-1 ∈ O(n * log n)
O(g(n)) = {f(n): there exist positive constants c
and n0 such that f(n) <= c*g(n) for all n >= n0}
10
Typical Complexities
11
Complexity Notation Description
constant O(1)
Constant number of
operations, not depending on
the input data size, e.g.
n = 1 000 000  1-2 operations
logarithmic O(log n)
Number of operations propor-
tional of log2(n) where n is the
size of the input data, e.g. n =
1 000 000 000  30 operations
linear O(n)
Number of operations
proportional to the input data
size, e.g. n = 10 000  5 000
operations
Typical Complexities (2)
12
Complexity Notation Description
quadratic O(n2)
Number of operations
proportional to the square of
the size of the input data, e.g.
n = 500  250 000 operations
cubic O(n3)
Number of operations propor-
tional to the cube of the size
of the input data, e.g. n =
200  8 000 000 operations
exponential
O(2n),
O(kn),
O(n!)
Exponential number of
operations, fast growing, e.g.
n = 20  1 048 576 operations
Time Complexity and Speed
13
Complexity 10 20 50 100 1 000 10 000 100 000
O(1) < 1 s < 1 s < 1 s < 1 s < 1 s < 1 s < 1 s
O(log(n)) < 1 s < 1 s < 1 s < 1 s < 1 s < 1 s < 1 s
O(n) < 1 s < 1 s < 1 s < 1 s < 1 s < 1 s < 1 s
O(n*log(n)) < 1 s < 1 s < 1 s < 1 s < 1 s < 1 s < 1 s
O(n2) < 1 s < 1 s < 1 s < 1 s < 1 s 2 s 3-4 min
O(n3) < 1 s < 1 s < 1 s < 1 s 20 s 5 hours 231 days
O(2n) < 1 s < 1 s
260
days
hangs hangs hangs hangs
O(n!) < 1 s hangs hangs hangs hangs hangs hangs
O(nn) 3-4 min hangs hangs hangs hangs hangs hangs
Time and Memory Complexity
 Complexity can be expressed as formula on
multiple variables, e.g.
 Algorithm filling a matrix of size n * m with natural
numbers 1, 2, … will run in O(n*m)
 DFS traversal of graph with n vertices and m edges
will run in O(n + m)
 Memory consumption should also be considered,
for example:
 Running time O(n), memory requirement O(n2)
 n = 50 000  OutOfMemoryException
14
Polynomial Algorithms
 A polynomial-time algorithm is one whose
worst-case time complexity is bounded above
by a polynomial function of its input size
 Example of worst-case time complexity
 Polynomial-time: log n, 2n, 3n3 + 4n, 2 * n log n
 Non polynomial-time : 2n, 3n, nk, n!
 Non-polynomial algorithms don't work for
large input data sets
W(n) ∈ O(p(n))
15
Analyzing Complexity
of Algorithms
Examples
Complexity Examples
 Runs in O(n) where n is the size of the array
 The number of elementary steps is ~ n
int FindMaxElement(int[] array)
{
int max = array[0];
for (int i=0; i<array.length; i++)
{
if (array[i] > max)
{
max = array[i];
}
}
return max;
}
Complexity Examples (2)
 Runs in O(n2) where n is the size of the array
 The number of elementary steps is
~ n*(n+1) / 2
long FindInversions(int[] array)
{
long inversions = 0;
for (int i=0; i<array.Length; i++)
for (int j = i+1; j<array.Length; i++)
if (array[i] > array[j])
inversions++;
return inversions;
}
Complexity Examples (3)
 Runs in cubic time O(n3)
 The number of elementary steps is ~ n3
decimal Sum3(int n)
{
decimal sum = 0;
for (int a=0; a<n; a++)
for (int b=0; b<n; b++)
for (int c=0; c<n; c++)
sum += a*b*c;
return sum;
}
Complexity Examples (4)
 Runs in quadratic time O(n*m)
 The number of elementary steps is ~ n*m
long SumMN(int n, int m)
{
long sum = 0;
for (int x=0; x<n; x++)
for (int y=0; y<m; y++)
sum += x*y;
return sum;
}
Complexity Examples (5)
 Runs in quadratic time O(n*m)
 The number of elementary steps is
~ n*m + min(m,n)*n
long SumMN(int n, int m)
{
long sum = 0;
for (int x=0; x<n; x++)
for (int y=0; y<m; y++)
if (x==y)
for (int i=0; i<n; i++)
sum += i*x*y;
return sum;
}
Complexity Examples (6)
 Runs in exponential time O(2n)
 The number of elementary steps is ~ 2n
decimal Calculation(int n)
{
decimal result = 0;
for (int i = 0; i < (1<<n); i++)
result += i;
return result;
}
Complexity Examples (7)
 Runs in linear time O(n)
 The number of elementary steps is ~ n
decimal Factorial(int n)
{
if (n==0)
return 1;
else
return n * Factorial(n-1);
}
Complexity Examples (8)
 Runs in exponential time O(2n)
 The number of elementary steps is
~ Fib(n+1) where Fib(k) is the k-th
Fibonacci's number
decimal Fibonacci(int n)
{
if (n == 0)
return 1;
else if (n == 1)
return 1;
else
return Fibonacci(n-1) + Fibonacci(n-2);
}
Comparing Data Structures
Examples
Data Structures Efficiency
26
Data Structure Add Find Delete
Get-by-
index
Array (T[]) O(n) O(n) O(n) O(1)
Linked list
(LinkedList<T>)
O(1) O(n) O(n) O(n)
Resizable array list
(List<T>)
O(1) O(n) O(n) O(1)
Stack (Stack<T>) O(1) - O(1) -
Queue (Queue<T>) O(1) - O(1) -
Data Structures Efficiency (2)
27
Data Structure Add Find Delete
Get-by-
index
Hash table
(Dictionary<K,T>)
O(1) O(1) O(1) -
Tree-based
dictionary (Sorted
Dictionary<K,T>)
O(log n) O(log n) O(log n) -
Hash table based
set (HashSet<T>)
O(1) O(1) O(1) -
Tree based set
(SortedSet<T>)
O(log n) O(log n) O(log n) -
Choosing Data Structure
 Arrays (T[])
 Use when fixed number of elements should be
processed by index
 Resizable array lists (List<T>)
 Use when elements should be added and
processed by index
 Linked lists (LinkedList<T>)
 Use when elements should be added at the
both sides of the list
 Otherwise use resizable array list (List<T>)
28
Choosing Data Structure (2)
 Stacks (Stack<T>)
 Use to implement LIFO (last-in-first-out) behavior
 List<T> could also work well
 Queues (Queue<T>)
 Use to implement FIFO (first-in-first-out) behavior
 LinkedList<T> could also work well
 Hash table based dictionary (Dictionary<K,T>)
 Use when key-value pairs should be added fast and
searched fast by key
 Elements in a hash table have no particular order
29
Choosing Data Structure (3)
 Balanced search tree based dictionary
(SortedDictionary<K,T>)
 Use when key-value pairs should be added fast,
searched fast by key and enumerated sorted by key
 Hash table based set (HashSet<T>)
 Use to keep a group of unique values, to add
and check belonging to the set fast
 Elements are in no particular order
 Search tree based set (SortedSet<T>)
 Use to keep a group of ordered unique values
30
Summary
 Algorithm complexity is rough estimation of the
number of steps performed by given computation
 Complexity can be logarithmic, linear, n log n,
square, cubic, exponential, etc.
 Allows to estimating the speed of given code
before its execution
 Different data structures have different
efficiency on different operations
 The fastest add / find / delete structure is the
hash table – O(1) for all these operations
31
Algorithms Complexity and
Data Structures Efficiency
Questions?
http://academy.telerik.com
Exercises
1. A text file students.txt holds information about
students and their courses in the following format:
Using SortedDictionary<K,T> print the courses in
alphabetical order and for each of them prints the
students ordered by family and then by name:
33
Kiril | Ivanov | C#
Stefka | Nikolova | SQL
Stela | Mineva | Java
Milena | Petrova | C#
Ivan | Grigorov | C#
Ivan | Kolev | SQL
C#: Ivan Grigorov, Kiril Ivanov, Milena Petrova
Java: Stela Mineva
SQL: Ivan Kolev, Stefka Nikolova
Exercises (2)
2. A large trade company has millions of articles, each
described by barcode, vendor, title and price.
Implement a data structure to store them that
allows fast retrieval of all articles in given price range
[x…y]. Hint: use OrderedMultiDictionary<K,T>
from Wintellect's Power Collections for .NET.
3. Implement a data structure PriorityQueue<T>
that provides a fast way to execute the following
operations: add element; extract the smallest element.
4. Implement a class BiDictionary<K1,K2,T> that
allows adding triples {key1, key2, value} and fast
search by key1, key2 or by both key1 and key2.
Note: multiple values can be stored for given key.
34
Exercises (3)
5. A text file phones.txt holds information about
people, their town and phone number:
Duplicates can occur in people names, towns and
phone numbers. Write a program to execute a
sequence of commands from a file commands.txt:
 find(name) – display all matching records by given
name (first, middle, last or nickname)
 find(name, town) – display all matching records by
given name and town
35
Mimi Shmatkata | Plovdiv | 0888 12 34 56
Kireto | Varna | 052 23 45 67
Daniela Ivanova Petrova | Karnobat | 0899 999 888
Bat Gancho | Sofia | 02 946 946 946

19. algorithms and-complexity

  • 1.
    Algorithms Complexity and DataStructures Efficiency Computational Complexity, Choosing Data Structures Svetlin Nakov Telerik Corporation www.telerik.com
  • 2.
    Table of Contents 1.Algorithms Complexity and Asymptotic Notation  Time and Memory Complexity  Mean, Average and Worst Case 2. Fundamental Data Structures – Comparison  Arrays vs. Lists vs.Trees vs. Hash-Tables 3. Choosing Proper Data Structure 2
  • 3.
    Why Data Structuresare Important?  Data structures and algorithms are the foundation of computer programming  Algorithmic thinking, problem solving and data structures are vital for software engineers  All .NET developers should know when to use T[], LinkedList<T>, List<T>, Stack<T>, Queue<T>, Dictionary<K,T>, HashSet<T>, SortedDictionary<K,T> and SortedSet<T>  Computational complexity is important for algorithm design and efficient programming 3
  • 4.
  • 5.
    Algorithm Analysis  Whywe should analyze algorithms?  Predict the resources that the algorithm requires  Computational time (CPU consumption)  Memory space (RAM consumption)  Communication bandwidth consumption  The running time of an algorithm is:  The total number of primitive operations executed (machine independent steps)  Also known as algorithm complexity 5
  • 6.
    Algorithmic Complexity  Whatto measure?  Memory  Time  Number of steps  Number of particular operations  Number of disk operations  Number of network packets  Asymptotic complexity 6
  • 7.
    Time Complexity  Worst-case An upper bound on the running time for any input of given size  Average-case  Assume all inputs of a given size are equally likely  Best-case  The lower bound on the running time 7
  • 8.
    Time Complexity –Example  Sequential search in a list of size n  Worst-case:  n comparisons  Best-case:  1 comparison  Average-case:  n/2 comparisons  The algorithm runs in linear time  Linear number of operations … … … … … … … n 8
  • 9.
    Algorithms Complexity  Algorithmcomplexity is rough estimation of the number of steps performed by given computation depending on the size of the input data  Measured through asymptotic notation  O(g) where g is a function of the input data size  Examples:  Linear complexity O(n) – all elements are processed once (or constant number of times)  Quadratic complexity O(n2) – each of the elements is processed n times 9
  • 10.
    Asymptotic Notation: Definition Asymptotic upper bound  O-notation (Big O notation)  For given function g(n), we denote by O(g(n)) the set of functions that are different than g(n) by a constant  Examples:  3 * n2 + n/2 + 12 ∈ O(n2)  4*n*log2(3*n+1) + 2*n-1 ∈ O(n * log n) O(g(n)) = {f(n): there exist positive constants c and n0 such that f(n) <= c*g(n) for all n >= n0} 10
  • 11.
    Typical Complexities 11 Complexity NotationDescription constant O(1) Constant number of operations, not depending on the input data size, e.g. n = 1 000 000  1-2 operations logarithmic O(log n) Number of operations propor- tional of log2(n) where n is the size of the input data, e.g. n = 1 000 000 000  30 operations linear O(n) Number of operations proportional to the input data size, e.g. n = 10 000  5 000 operations
  • 12.
    Typical Complexities (2) 12 ComplexityNotation Description quadratic O(n2) Number of operations proportional to the square of the size of the input data, e.g. n = 500  250 000 operations cubic O(n3) Number of operations propor- tional to the cube of the size of the input data, e.g. n = 200  8 000 000 operations exponential O(2n), O(kn), O(n!) Exponential number of operations, fast growing, e.g. n = 20  1 048 576 operations
  • 13.
    Time Complexity andSpeed 13 Complexity 10 20 50 100 1 000 10 000 100 000 O(1) < 1 s < 1 s < 1 s < 1 s < 1 s < 1 s < 1 s O(log(n)) < 1 s < 1 s < 1 s < 1 s < 1 s < 1 s < 1 s O(n) < 1 s < 1 s < 1 s < 1 s < 1 s < 1 s < 1 s O(n*log(n)) < 1 s < 1 s < 1 s < 1 s < 1 s < 1 s < 1 s O(n2) < 1 s < 1 s < 1 s < 1 s < 1 s 2 s 3-4 min O(n3) < 1 s < 1 s < 1 s < 1 s 20 s 5 hours 231 days O(2n) < 1 s < 1 s 260 days hangs hangs hangs hangs O(n!) < 1 s hangs hangs hangs hangs hangs hangs O(nn) 3-4 min hangs hangs hangs hangs hangs hangs
  • 14.
    Time and MemoryComplexity  Complexity can be expressed as formula on multiple variables, e.g.  Algorithm filling a matrix of size n * m with natural numbers 1, 2, … will run in O(n*m)  DFS traversal of graph with n vertices and m edges will run in O(n + m)  Memory consumption should also be considered, for example:  Running time O(n), memory requirement O(n2)  n = 50 000  OutOfMemoryException 14
  • 15.
    Polynomial Algorithms  Apolynomial-time algorithm is one whose worst-case time complexity is bounded above by a polynomial function of its input size  Example of worst-case time complexity  Polynomial-time: log n, 2n, 3n3 + 4n, 2 * n log n  Non polynomial-time : 2n, 3n, nk, n!  Non-polynomial algorithms don't work for large input data sets W(n) ∈ O(p(n)) 15
  • 16.
  • 17.
    Complexity Examples  Runsin O(n) where n is the size of the array  The number of elementary steps is ~ n int FindMaxElement(int[] array) { int max = array[0]; for (int i=0; i<array.length; i++) { if (array[i] > max) { max = array[i]; } } return max; }
  • 18.
    Complexity Examples (2) Runs in O(n2) where n is the size of the array  The number of elementary steps is ~ n*(n+1) / 2 long FindInversions(int[] array) { long inversions = 0; for (int i=0; i<array.Length; i++) for (int j = i+1; j<array.Length; i++) if (array[i] > array[j]) inversions++; return inversions; }
  • 19.
    Complexity Examples (3) Runs in cubic time O(n3)  The number of elementary steps is ~ n3 decimal Sum3(int n) { decimal sum = 0; for (int a=0; a<n; a++) for (int b=0; b<n; b++) for (int c=0; c<n; c++) sum += a*b*c; return sum; }
  • 20.
    Complexity Examples (4) Runs in quadratic time O(n*m)  The number of elementary steps is ~ n*m long SumMN(int n, int m) { long sum = 0; for (int x=0; x<n; x++) for (int y=0; y<m; y++) sum += x*y; return sum; }
  • 21.
    Complexity Examples (5) Runs in quadratic time O(n*m)  The number of elementary steps is ~ n*m + min(m,n)*n long SumMN(int n, int m) { long sum = 0; for (int x=0; x<n; x++) for (int y=0; y<m; y++) if (x==y) for (int i=0; i<n; i++) sum += i*x*y; return sum; }
  • 22.
    Complexity Examples (6) Runs in exponential time O(2n)  The number of elementary steps is ~ 2n decimal Calculation(int n) { decimal result = 0; for (int i = 0; i < (1<<n); i++) result += i; return result; }
  • 23.
    Complexity Examples (7) Runs in linear time O(n)  The number of elementary steps is ~ n decimal Factorial(int n) { if (n==0) return 1; else return n * Factorial(n-1); }
  • 24.
    Complexity Examples (8) Runs in exponential time O(2n)  The number of elementary steps is ~ Fib(n+1) where Fib(k) is the k-th Fibonacci's number decimal Fibonacci(int n) { if (n == 0) return 1; else if (n == 1) return 1; else return Fibonacci(n-1) + Fibonacci(n-2); }
  • 25.
  • 26.
    Data Structures Efficiency 26 DataStructure Add Find Delete Get-by- index Array (T[]) O(n) O(n) O(n) O(1) Linked list (LinkedList<T>) O(1) O(n) O(n) O(n) Resizable array list (List<T>) O(1) O(n) O(n) O(1) Stack (Stack<T>) O(1) - O(1) - Queue (Queue<T>) O(1) - O(1) -
  • 27.
    Data Structures Efficiency(2) 27 Data Structure Add Find Delete Get-by- index Hash table (Dictionary<K,T>) O(1) O(1) O(1) - Tree-based dictionary (Sorted Dictionary<K,T>) O(log n) O(log n) O(log n) - Hash table based set (HashSet<T>) O(1) O(1) O(1) - Tree based set (SortedSet<T>) O(log n) O(log n) O(log n) -
  • 28.
    Choosing Data Structure Arrays (T[])  Use when fixed number of elements should be processed by index  Resizable array lists (List<T>)  Use when elements should be added and processed by index  Linked lists (LinkedList<T>)  Use when elements should be added at the both sides of the list  Otherwise use resizable array list (List<T>) 28
  • 29.
    Choosing Data Structure(2)  Stacks (Stack<T>)  Use to implement LIFO (last-in-first-out) behavior  List<T> could also work well  Queues (Queue<T>)  Use to implement FIFO (first-in-first-out) behavior  LinkedList<T> could also work well  Hash table based dictionary (Dictionary<K,T>)  Use when key-value pairs should be added fast and searched fast by key  Elements in a hash table have no particular order 29
  • 30.
    Choosing Data Structure(3)  Balanced search tree based dictionary (SortedDictionary<K,T>)  Use when key-value pairs should be added fast, searched fast by key and enumerated sorted by key  Hash table based set (HashSet<T>)  Use to keep a group of unique values, to add and check belonging to the set fast  Elements are in no particular order  Search tree based set (SortedSet<T>)  Use to keep a group of ordered unique values 30
  • 31.
    Summary  Algorithm complexityis rough estimation of the number of steps performed by given computation  Complexity can be logarithmic, linear, n log n, square, cubic, exponential, etc.  Allows to estimating the speed of given code before its execution  Different data structures have different efficiency on different operations  The fastest add / find / delete structure is the hash table – O(1) for all these operations 31
  • 32.
    Algorithms Complexity and DataStructures Efficiency Questions? http://academy.telerik.com
  • 33.
    Exercises 1. A textfile students.txt holds information about students and their courses in the following format: Using SortedDictionary<K,T> print the courses in alphabetical order and for each of them prints the students ordered by family and then by name: 33 Kiril | Ivanov | C# Stefka | Nikolova | SQL Stela | Mineva | Java Milena | Petrova | C# Ivan | Grigorov | C# Ivan | Kolev | SQL C#: Ivan Grigorov, Kiril Ivanov, Milena Petrova Java: Stela Mineva SQL: Ivan Kolev, Stefka Nikolova
  • 34.
    Exercises (2) 2. Alarge trade company has millions of articles, each described by barcode, vendor, title and price. Implement a data structure to store them that allows fast retrieval of all articles in given price range [x…y]. Hint: use OrderedMultiDictionary<K,T> from Wintellect's Power Collections for .NET. 3. Implement a data structure PriorityQueue<T> that provides a fast way to execute the following operations: add element; extract the smallest element. 4. Implement a class BiDictionary<K1,K2,T> that allows adding triples {key1, key2, value} and fast search by key1, key2 or by both key1 and key2. Note: multiple values can be stored for given key. 34
  • 35.
    Exercises (3) 5. Atext file phones.txt holds information about people, their town and phone number: Duplicates can occur in people names, towns and phone numbers. Write a program to execute a sequence of commands from a file commands.txt:  find(name) – display all matching records by given name (first, middle, last or nickname)  find(name, town) – display all matching records by given name and town 35 Mimi Shmatkata | Plovdiv | 0888 12 34 56 Kireto | Varna | 052 23 45 67 Daniela Ivanova Petrova | Karnobat | 0899 999 888 Bat Gancho | Sofia | 02 946 946 946

Editor's Notes

  • #3 (c) 2007 National Academy for Software Development - http://academy.devbg.org. All rights reserved. Unauthorized copying or re-distribution is strictly prohibited.*
  • #6 5##
  • #7 6##
  • #8 7##
  • #9 8##
  • #11 10##
  • #16 15##
  • #17 16##
  • #18 17##
  • #19 18##
  • #20 19##
  • #21 20##
  • #22 21##
  • #23 22##
  • #24 23##
  • #25 24##
  • #26 25##
  • #32 (c) 2007 National Academy for Software Development - http://academy.devbg.org. All rights reserved. Unauthorized copying or re-distribution is strictly prohibited.*