Like this presentation? Why not share!

# Applicationof datastructures

## by Hitesh Wagle on Dec 19, 2011

• 346 views

### Views

Total Views
346
Views on SlideShare
346
Embed Views
0

Likes
0
17
0

No embeds

### Categories

Uploaded via SlideShare as Microsoft PowerPoint

## Applicationof datastructuresPresentation Transcript

• Application of Data Structures
• Overview
• Priority Queue structures
• Heaps
• Application: Dijkstra’s algorithm
• Cumulative Sum Data Structures on Intervals
• Augmenting data structures with extra info to solve questions
• Priority Queue (PQ) Structures
• Stores elements in a list by comparing a key field
• Often has other satellite data
• For example, when sorting pixels by their R value, we consider the R as the key field and GB as satellite data
• Priority queues allow us to sort elements by their key field.
• Common PQ operations
• Create()
• Creates an empty priority queue
• Find_Min()
• Returns the smallest element (by key field)
• Insert(x)
• Insert element x (with predefined key field)
• Delete(x)
• Delete position x from the queue
• Change(x, k)
• Change key field of position x to k
• Optional PQ operations
• Union (a,b)
• Combines two PQs a and b
• Search (k)
• Returns the position of the element in the heap with key value k
• Considerations when implementing a PQ in competition
• How complicated is it?
• Is the code likely to be buggy?
• How fast does it need to be?
• Does a constant factor also come into the equation?
• Do I need to store extra data to do a Search?
• During the course of this presentation, we shall assume that there exists existing extra data which allows us to do a search in O(1) time. The handling of this data structure will be assumed and not covered.
• Linear Array
• Unsorted Array
• Create, Insert, Change in O(1) time
• Find_min, Delete in O(n) time
• Sorted Array
• Create, Find_min in O(1) time
• Insert, Delete, Change in O(n + log n) = O(n) time
• Binary Heaps
• Will be the most common structure that will be implemented in competition setting
• Efficient for most applications
• Easy to implement
• A heap is a structure where the value of a node is less than the value of all of its children
• A binary heap is a heap where the maximum number of children for each node is 2.
• Array implementation
• Consider a heap of size nheap in an array BHeap[1 ..nheap] (Define BHeap[nheap+1 .. (nheap*2)+1] to be INFINITY for practical reasons)
• The children of BHeap[x] are BHeap[x*2] and BHeap[x*2+1]
• The parent of BHeap[x] are BHeap[x/2]
• This allows a near uniform Binary Heap where we can ensure that the number of levels in this heap is O(log n)
• Some properties wrt Key values: BHeap[x] >= BHeap[x/2], BHeap[x] <= BHeap[x*2], BHeap[x] <= BHeap[x*2+1], BHeap[x*2] ?? BHeap[x*2+1]
• PQ Operations on a BHeap
• We define BTree(x) to be the Binary Tree rooted at BHeap[x]
• We define Heapify(x) to be an operation that does the following:
• Assume: BTree(x*2) and BTree(x*2+1) are binary heaps but BTree(x) is not necessarily a binary heap
• Produce: BTree(x) binary heap
• Details of Heapify in later slides – but for now, we assume Heapify is O(log n)
• For the rest of the presentation, we assume the variable n refers to nheap
• Operations on a BHeap
• Create is trivial – O(1) time
• Find_min:
• Return BHeap[1]
• O(1) time
• Insert (element with key value x)
• nheap++
• BHeap[nheap] = x
• T = nheap
• While (T != 1 && Bheap[T] < BHeap[T/2])
• Swap (Bheap[T], BHeap[T/2]
• T = T / 2
• O(log n) time as the number of levels is O(log n)
• Operations on a BHeap
• ChangeDown (position x, new key value k)
• Assume: k < existing BHeap[x]
• BHeap[x] = k
• T = x
• While (T != 1 && BHeap[T] < BHeap[T/2])
• Swap (BHeap[T], BHeap[T/2])
• T = T/2
• Complexity: O(log n)
• This procedure is known as “bubbling up” the heap
• Operations on a BHeap
• ChangeUp (position x, new key value k)
• Assume: k > existing BHeap[x]
• BHeap[x] = k
• Heapify(x)
• O(log n) as complexity of Heapify is O(log n)
• Operations on a BHeap
• Delete (position x on the heap)
• BHeap[x] = BHeap[nheap]
• nheap—
• Heapify(x)
• T = x
• While (T != 1 && BHeap[T] < BHeap[T/2])
• Swap (BHeap[T], BHeap[T/2])
• T = T / 2
• Complexity is O(log n)
• Why must I do both Heapify and “bubble up”?
• Operations on a BHeap
• Heapify (position x on the heap)
• T = min(BHeap[x], BHeap[x*2], BHeap[x*2+1])
• If (T == BHeap[x]) return;
• K = position where BHeap[K] = T
• Swap(BHeap[x], BHeap[K])
• Heapify(K)
• O(log n) as the maximum number of levels in the heap is O(log n) and Heapify only goes through each level at most once
• BHeap Operations: Summary
• Create, Find_min in O(1) time
• Change (includes both ChangeUp and ChangeDown), Insert, and Delete are O(log n) time
• Union operations are how long?
• Insertion: O(n log n) union
• Heapify: O(n) union
• Corollary: Heapsort
• We can convert an unsorted array to a heap using Heapify (why does this work?):
• For (i = n/2; i >= 1; i--)
• Heapify(i)
• We can then return a sorted list (list initially empty):
• For (i = 1; i <= n; i++)
• Append the value of find_min to the list
• Delete(1)
• Complexity is O(n log n)
• Binomial Trees
• Define Binomial Tree B(k) as follows:
• B(0) is a single node
• B(n), n != 0, is formed by merging two B(n-1) trees in the following way:
• The root of the B(n) tree is the root of one of the B(n-1) trees, and the (new) leftmost child of this root is the root of the other B(n-1) tree.
• Within the tree, the heap property holds i.e. that the key field of any node is greater than the key field of all its children.
• Properties of Binomial Trees
• The number of nodes in B(k) is exactly 2^k.
• The height of B(k) is exactly (k + 1)
• For any tree B(k)
• The root of B(k) has exactly k children
• If we take the children of B(k) from left to right, they form the roots of a B(k-1), B(k-2), …, B(0) tree in that order
• Binomial Heaps
• Binomial Heaps are a forest of binomial trees with the following properties:
• All the binomial trees are of different sizes
• The binomial trees are ordered (from left to right) by increasing size
• If we consider the fact that the size of B(k) is 2^k, the binomial tree B(k) exists in a binomial heap of n nodes iff the bit representing 2^k is “1” in the binary representation of n
• For example: 13 (decimal) = 1101 (binary), so the binomial heap with 13 nodes consists of the binomial trees B(0), B(2), and B(3).
• Binomial Heap Implementation
• Each node will store the following data:
• Key field
• Pointers (if non-existent, points to NIL) to
• Parent
• Next Sibling (ordered left to right; a sibling must have the same parent); For roots of binomial trees, next sibling points to the root of the next binomial tree
• Leftmost child
• Number of children in field degree
• Any other data that might be useful for the program
• The binomial heap is represented by a head pointer that points to the root of the smallest binomial tree (which is the leftmost binomial tree)
• Operations on Binomial Trees
• Links two binomial trees with root h1 and h2 of the same order k to form a new binomial tree of order (k+1)
• We assume h1->key < h2->key which implies that h1 is the root of the new tree
• T = h1->leftchild
• h1->leftchild = h2
• h2->parent = h1
• H2->next_sibling= T
• O(1) time
• Operations on binomial heaps
• Create – Create a new binomial heap with one node ( key field set)
• Set Parent, Leftchild, Next sibling to NIL
• O(1) time
• Find_min
• X = head, min = INFINITY
• While (X != nil)
• If (X->key < min) min = X->key
• X = X->next_sibling
• Return min
• O(log n) time as there are at most log n binomial trees (log n bits)
• More Operations
• Merge (h1, h2, L)
• Given binomial heaps with head pointers h1 and h2, create a list L of all the binomial trees of h1 U h2 arranged in ascending order of size
• For any order k, there may be zero, one, or two binomial trees of order k in this list.
• More Operations
• Merge (h1, h2, L)
• Assume that NIL is a node of infinitely small order
• L = empty
• While (h1 != NIL || h2 != NIL)
• If (h1->degree < h2->degree)
• Append the (binomial)tree with root h1 to L
• h1 = h1->next_sibling
• Else
• Apply above steps to h2 instead
• More Operations
• Union (h1, h2)
• The fundamental operation involving binomial heaps
• Takes two binomial heaps with head pointers h1 and h2 and creates a new binomial heap of the union of h1 and h2
• More Operations
• Union (h1, h2)
• Merge (h1, h2, L)
• Go by increasing k in the list L until L is empty
• If there is exactly one or exactly three (how can this happen?) binomial trees of order k in L, append one binomial tree of order k to the binomial heap and remove that tree from L
• If there are two trees of order k, remove both trees, use Link to form a tree of order (k+1) and pre-pend this tree to L
• Union is O(log n)
• More Operations
• Inserting a new node with key field set
• Create a new binomial heap with that one node
• Union (existing heap with head h, new heap)
• O (log n) time
• ChangeDown (node at position x, new value)
• Decreasing the key value of a node
• Same idea as binary heap: “Bubble” up the binomial tree containing this node (exchange only key fields and satellite data! What’s the complexity if you physically change the node?)
• O (log n) time
• More Operations
• Delete (node at position x)
• Deleting position x from the heap
• ChangeDown(x, -INFINITY)
• Now x is at the root of its binomial tree
• Supposing that the binomial tree is of order k
• Recall that the children of the root of the binomial tree, from right to left, are binomial trees of order 0, 1, 2, 3, 4, …, k-1
• Form a new binomial heap with the children of the root of this binomial tree the roots in the new binomial heap
• Remove the original binomial tree from the original binomial heap
• Union (original heap, new heap)
• O(log n) complexity
• More Operations
• ChangeUp (node at position X, new value)
• Delete (X)
• Insert (new value)
• O (log n) time
• Summary – Binomial Heaps
• Create in O(1) time
• Union, Find_min, Delete, Insert, and Change operations take O(log n) time
• In general, because they are more complicated, in competition it is far more prudent (saves time coding and debugging) to use a binary heap instead
• Unless there are MANY Union operations
• Application of heaps: Dijkstra
• The following describes how Dijkstra’s algorithm can be coded with a binary heap
• Initializing phase:
• Let n be the number of nodes
• Create a heap of size n, all key fields initialized to INFINITY
• Change_val (s, 0) where s is the source node
• Running of Dijkstra’s algorithm
• While (heap is not empty)
• X = node corresponding to find_min value
• Delete (position of X in heap = 1)
• For all nodes k that are adjacent to X
• If (cost[X] + distance[X][k] < cost[k])
• ChangeDown (position of k in heap, cost[X] + distance[X][k])
• Analysis of running time
• At most n nodes are deleted
• O(n log n)
• Let m be the number of edges. Each edge is relaxed at most once.
• O(m log n)
• Total running time O([m+n] log n)
• This is faster than using a basic array list unless the graph is very dense, in which case m is about O(n^2) which leads to a running time of O(n^2 log n)
• Cumulative Sum on Intervals
• Problem: We have a line that runs from x coordinate 1 to x coordinate N. At x coordinate X [X an integer between 0 and N], there is g(X) gold. Given an interval [a,b], how much gold is there between a and b?
• How efficiently can this be done if we dynamically change the amount of gold and the interval [a,b] keeps changing?
• Cumulative Sum Array
• Let us define C(0) = 0, and C(x) = C(x-1) + g(x) where g(x) is the amount of gold at position x
• C(x) then defines the total amount of gold from position 1 to position x
• The amount of gold in interval [a,b] is simply C(b) – C(a-1)
• For any change in a or b, we can perform the update in O(1) time
• However, if we change g(x), we will have to change C(x), C(x+1), C(x+2), …, C(N)
• Any change in gold results in an update in O(N) time
• Cumulative Sum Tree
• We can use the binary representation of any number to come up with a cumulative sum tree
• For example, let say we take 13 (decimal) = 1101 (binary)
• The cumulative sum of g(1) + g(2) + … g(13) can be represented as the sum of:
• g(1) + g(2) + … + g(8) [ 8 elements ]
• g(9) + g(10) + … + g(12) [ 4 elements ]
• g(13) [ 1 element ]
• Notice that the number of elements in each case represents a bit that is “1” in the binary representation of the number
• Cumulative Sum Tree
• Another example: C(19)
• 19 (decimal) is 10011 (binary)
• C(19) is the sum of the following:
• g(1) + g(2) + … + g(16) [ 16 elements ]
• g(17) + g(18) [ 2 elements ]
• g(19) [ 1 element ]
• Cumulative Sum Tree
• Let us define C2(x) to be the sum of g(x) + g(x-1) + … + g(p + 1) where p is a number with the same binary representation as x except the least significant bit of x (the rightmost bit of x that is “1”) is “0”
• Examples of x and the corresponding p:
• x = 6 [110], p = 4 [100]
• x = 13 [1101], p = 12 [1100]
• x = 16 [10000], p = 0 [00000]
• Cumulative Sum Tree
• If we want to find the cumulative sum C(x) = g(1) + g(2) + … + g(x), we can trace through the values of C2 using the binary representation of x
• Examples:
• C(13) = C2(8) + C2(8+4) + C2(8+4+1)
• C(16) = C2(16)
• C(21) = C2(16) + C2(16+4) + C2(16+4+1)
• C(99) = C2(64) + C2(64+32) + C2(64+32+2) + C2(64+32+2+1)
• This allows us to find C(x) in log x time
• Hence the amount of gold in interval [a,b] = C(b) – C(a-1) can be found in log N time, which implies updates of a and b can be done in O(log N)
• Cumulative Sum Tree
• What happens when we change g(x)?
• If g(x) is changed, we only need to update C2(y) where C2(y) covers g(x)
• We can go through all necessary C2(y) in the following way:
• While (x <= N)
• Update C2(x)
• Add the value of the least significant bit of x to x
• This runs in O(log N) time
• Hence updates to g can also be done in O(log n) time, which is a great improvement over the O(N) needed for an array.
• Cumulative Sum Tree
• Examples [binary representation in brackets]
• Change to g(5) [ 101 ] : Update C2(5), C2(6), C2(8), C2(16) and all C2(power of 2 > 16)
• Change to g(13) [ 1101 ]: Update C2(13), C2(14), C2(16), and all C2(power of 2 > 16)
• Change to g(35) [ 100011 ]: Update C2(35), C2(36), C2(40), C2(48), C2(64), and all C2(power of 2 > 64)
• We can implement a cumulative sum tree very simply: By simply using a linear array to store the values of C2.
• Can we extend a cumulative sum tree to 2 or more dimensions?
• See IOI 2001 Day 1 Question 1
• Sum of Intervals Tree
• Another way to solve the question is to use a “Sum of Intervals” Binary Tree
• Each node in the tree is represented by (L, R) and the value of (L,R) is g(L) + g(L+1) + … + g(R)
• The root of the tree has L = 1 and R = N
• Every leaf has L = R
• Every non-leaf has children (L, [L+R]/2) [left child] and ([L+R]/2+1, R) [right child]
• The number of nodes in the tree is O(2*N) [ why? ]
• In an implementation, every node should have pointers to its children and its parent
• Sum of Intervals Tree
• How to find C(x) = g(1) + g(2) + … + g(x)?
• We trace from the root downwards
• L = 1, R = N, C = 0
• While (L != R)
• M = (L + R) / 2
• If (M < x)
• C += value of (L,R)
• Set L and R to the left child of the current node
• Else
• Set L and R to the right child of the current node
• C += value at (L,R) [ or (L,L) or (R,R) as L = R ]
• Time complexity: O(log n)
• Sum of Intervals Tree
• What happens when g(x) is changed?
• Trace from (x,x) upwards to the root
• Let L = R = x
• While (L,R) is not the root
• Update the value of (L,R)
• Set (L,R) to the parent of (L,R)
• Update the root
• Complexity of O(log N)
• Hence all updates of interval [a,b] and g(x) can be done in O(log N) time
• Augmenting Data Structures
• It is often useful to change the data structure in some way, by adding additional data in each node or changing what each node represents.
• This allows us to use the same data structure to solve problems
• For example, we can use so-called “interval trees” to solve not just cumulative sum problems
• We can use properties of elements in the interval (L,R) that are related to L and R.
• Other data structures
• Balanced (and unbalanced) binary trees
• Red-Black trees
• 2-3-4 trees
• Splay trees
• Suffix Trees
• Fibonacci Heaps