"Once you succeed in writing the programs for complicated algorithms, they usually run extremely fast. The computer doesn't need to understand the algorithm, it’s task is only to run the programs.“
There are a number of facets to good programs: they must
A scheme for organizing related pieces of information
A way in which sets of data are organized in a particular system
An organised aggregate of data items
A computer interpretable format used for storing, accessing, transferring and archiving data
The way data is organised to ensure efficient processing: this may be in lists, arrays, stacks, queues or trees
Data structure is a specialized format for organizing and storing data so that it can be be accessed and worked with in appropriate ways to make an a program efficient
Data Structure = Organised Data + Allowed Operations
There are two design aspects to every data structure:
the interface part
The publicly accessible functions of the type. Functions like creation and destruction of the object, inserting and removing elements (if it is a container), assigning values etc.
the implementation part :
Internal implementation should be independent of the interface. Therefore, the details of the implementation aspect should be hidden out from the users.
These collections may be organised in many ways and use many different program structures to represent them, yet, from an abstract point of view, there will be a few common operations on any collection.
create Create a new collection add Add an item to a collection delete Delete an item from a collection find Find an item matching some criterion in the collection destroy Destroy the collection
An Array is the simplest form of implementing a collection
Each object in an array is called an array element
Each element has the same data type (although they may have different values)
Individual elements are accessed by index using a consecutive range of integers
One Dimensional Array or vector
int A[10];
for ( i = 0; i < 10; i++)
A[i] = i +1;
A[0] 1 A[1] 2 A[2] 3 A[n-2] N-1 A[n-1] N
8.
Arrays (Cont.) Multi-dimensional Array A multi-dimensional array of dimension n (i.e., an n -dimensional array or simply n -D array) is a collection of items which is accessed via n subscript expressions. For example, in a language that supports it, the (i,j) th element of the two-dimensional array x is accessed by writing x[i,j]. m x i : : : : : : : : : : : : : : : 2 1 0 n j 10 9 8 7 6 5 4 3 2 1 0 C o l u m n R O W
Simple and Fast but must specify size during construction
If you want to insert/ remove an element to/ from a fixed position in the list, then you must move elements already in the list to make room for the subsequent elements in the list.
Thus, on an average, you probably copy half the elements.
In the worst case, inserting into position 1 requires to move all the elements.
Copying elements can result in longer running times for a program if insert/ remove operations are frequent, especially when you consider the cost of copying is huge (like when we copy strings)
An array cannot be extended dynamically, one have to allocate a new array of the appropriate size and copy the old array to the new array
void *FindinCollection( Collection c, void *key ) { Node n = c->head; while ( n != NULL ) { if ( KeyCmp( ItemKey( n->item ), key ) == 0 ) { return n->item; n = n->next; } return NULL; } Add time Constant - independent of n Search time Worst case - n
struct t_node { void *item; struct t_node *next; } node; typedef struct t_node *Node; struct collection { Node head, tail; }; head tail By ensuring that the tail of the list is always pointing to the head, we can build a circularly linked list head is tail->next LIFO or FIFO using ONE pointer
A tree is a finite nonempty set of elements in which one element is called the ROOT and remaining element partitioned into m >=0 disjoint subsets, each of which is itself a tree
Different types of trees – binary tree, n-ary tree, red-black tree, AVL tree
the highest priority item is at the root and is trivially extracted. But if the root is deleted, we are left with two sub-trees and we must efficiently re-create a single tree with the heap property.
The value of the heap structure is that we can both extract the highest priority item and insert a new one in O(log n ) time.
To work out how we're going to maintain the heap property, use the fact that a complete tree is filled from the left. So that the position which must become empty is the one occupied by the M. Put it in the vacant root position.
This has violated the condition that the root must be greater than each of its children. So interchange the M with the larger of its children.
The left subtree has now lost the heap property. So again interchange the M with the larger of its children.
We need to make at most h interchanges of a root of a subtree with one of its children to fully restore the heap property.
To add an item to a heap, we follow the reverse procedure. Place it in the next leaf position and move it up. Again, we require O( h ) or O(log n ) exchanges .
31.
Comparisons Arrays Simple, fast Inflexible O(1) O(n) inc sort O(n) O(n) O(logn) binary search Add Delete Find Linked List Simple Flexible O(1) sort -> no adv O(1) - any O(n) - specific O(n) (no bin search) Trees Still Simple Flexible O(log n) O(log n) O(log n)
function f( int x, int y) { int a; if ( term_cond ) return …; a = ….; return g( a ); } function g( int z ) { int p, q; p = …. ; q = …. ; return f(p,q); } Context for execution of f
Computer systems are often used to store large amounts of data from which individual records must be retrieved according to some search criterion. Thus the efficient storage of data to facilitate fast searching is an important issue
Logs Base 2 is by far the most common in this course. Assume base 2 unless otherwise noted! Small problems - we’re not interested! Large problems - we’re interested in this gap! n log 2 n
/* Bubble sort for integers */ #define SWAP(a,b) { int t; t=a; a=b; b=t; } void bubble( int a[], int n ) { int i, j; for(i=0;i<n;i++) { /* n passes thru the array */ /* From start to the end of unsorted part */ for(j=1;j<(n-i);j++) { /* If adjacent items out of order, swap */ if( a[j-1]>a[j] ) SWAP(a[j-1],a[j]); } } } Overall O(n 2 ) O( 1 ) statement Inner loop n -1, n -2, n -3, … , 1 iterations Outer loop n iterations
A Hash Table is a data structure that associates each element (e) to be stored in our table with a particular value known as a key (k)
We store item’s (k,e) in our tables
Simplest form of a Hash Table is an Array
A bucket array for a hash table is an array A of size N, where each cell of A is thought of as a bucket and the integer N defines the capacity of the array,
If we have a collection of n elements whose keys are unique integers in (1, m ), where m >= n , then we can store the items in a direct address table, T[m] , where Ti is either empty or contains one of the elements of our collection.
Searching a direct address table is clearly an O(1) operation: for a key, k , we access Tk ,
Drawback 1: The Hash Table uses O(N) space which is not necessarily related to the number of elements n actually present in our set.
If N is large relative to n, then this approach is wasteful of space.
Drawback 2: The bucket array implementation of Hash Tables requires key values (k) associated with elements (e) to be unique and in the range [0, N-1], which is often not the case.
Associated with each Hash Table is a function h, known as a Hash Function.
This Hash Function maps each key in our set to an integer in the range [0, N-1]. Where N is the capacity of the bucket array.
The idea is to use the hash function value, h(k) as an index into our bucket array.
So we store the item (k, e) in our bucket at A[h(k)]. That is A[h(k)] = (Item)(k, e);
Unfortunately, finding a perfect hashing function is not always possible. Let's say that we can find a hash function , h(k) , which maps most of the keys onto unique integers, but maps a small number of keys on to the same integer. If the number of collisions (cases where multiple keys map onto the same integer), is sufficiently small, then hash tables work quite well and give O(1) search times.
In the small number of cases, where multiple keys map to the same integer, then elements with different keys may be stored in the same "slot" of the hash table. It is clear that when the hash function is used to locate a potential match, it will be necessary to compare the key of that element with the search key. But there may be more than one element which should be stored in a single slot of the table. Various techniques are used to manage this problem:
One simple scheme is to chain all collisions in lists attached to the appropriate slot. This allows an unlimited number of collisions to be handled and doesn't require a priori knowledge of how many elements are contained in the collection. The tradeoff is the same as with linked lists versus array implementations of collections: linked list overhead in space and, to a lesser extent, in time .
Re-hashing schemes use a second hashing operation when there is a collision. If there is a further collision, we re-hash until an empty "slot" in the table is found. The re-hashing function can either be a new function or a re-application of the original one. As long as the functions are applied to a key in the same order, then a sought key can always be located.
h(j)=h(k) , so the next hash function, h1 is used. A second collision occurs, so h2 is used.
Divide the pre-allocated table into two sections: the primary area to which keys are mapped and an area for collisions, normally termed the overflow area .
When a collision occurs, a slot in the overflow area is used for the new element and a link from the primary slot established as in a chained system. This is essentially the same as chaining, except that the overflow area is pre-allocated and thus possibly faster to access. As with re-hashing, the maximum number of elements must be known in advance, but in this case, two parameters must be estimated: the optimum size of the primary and overflow areas.
A graph consists of a set of nodes (or vertices ) and a set of arcs (or edges )
Graph G = Nodes {A,B, C} Arcs {(A,C), (B,C)}
Terminology :
V = Set of vertices (or nodes)
|V| = # of vertices or cardinality of V (in usual terminology |V| = n)
E = Set of edges, where an edge is defined by two vertices
|E| = # of edges or cardinality of E
A Graph, G is a pair G = (V, E)
Labeled Graphs: We may give edges and vertices labels. Graphing applications often require the labeling of vertices Edges might also be numerically labeled. For instance if the vertices represent cities, the edges might be labeled to represent distances.
A directed graph is one in which every edge (u, v) has a direction, so that (u, v) is different from (v, u). In an undirected graph, there is no distinction between (u, v) and (v, u). There are two possible situations that can arise in a directed graph between vertices u and v.
i) only one of (u, v) and (v, u) is present.
ii) both (u, v) and (v, u) are present.
An edge (u, v) is said to be directed from u to v if the pair (u, v) is ordered with u preceding v.
E.g. A Flight Route
An edge (u, v) is said to be undirected if the pair (u, v) is not ordered
An edge is said to be incident on a vertex if the vertex is one of the edges endpoints.
The outgoing edges of a vertex are the directed edges whose origin is that vertex.
The incoming edges of a vertex are the directed edges whose destination is that vertex.
65.
Graph Terminology U Y X W V Z a c b d e f g h i j Edge 'a' is incident on vertex V Edge 'h' is incident on vertex Z Edge 'g' is incident on vertex Y
66.
Graph Terminology U Y X W V Z a c b d e f g h i j The outgoing edges of vertex W are the edges with vertex W as origin {d, e, f}
67.
Graph Terminology U Y X W V Z a c b d e f g h i j The incoming edges of vertex X are the edges with vertex X as destination {b, e, g, i}
The degree of a vertex v, denoted deg(v) , is the number of incident edges of v.
The in-degree of a vertex v, denoted indeg(v) is the number of incoming edges of v.
The out-degree of a vertex v, denoted outdeg(v) is the number of outgoing edges of v.
69.
Graph Terminology U Y X W V Z a c b d e f g h i j The degree of vertex X is the number of incident edges on X. deg(X) = ?
70.
Graph Terminology U Y X W V Z a c b d e f g h i j The degree of vertex X is the number of incident edges on X. deg(X) = 5
71.
Graph Terminology U Y X W V Z a c b d e f g h i j The in-degree of vertex X is the number of edges that have vertex X as a destination. indeg(X) = ?
72.
Graph Terminology U Y X W V Z a c b d e f g h i j The in-degree of vertex X is the number of edges that have vertex X as a destination. indeg(X) = 4
73.
Graph Terminology U Y X W V Z a c b d e f g h i j The out-degree of vertex X is the number of edges that have vertex X as an origin. outdeg(X) = ?
74.
Graph Terminology U Y X W V Z a c b d e f g h i j The out-degree of vertex X is the number of edges that have vertex X as an origin. outdeg(X) = 1
each edge is preceded and followed by its endpoints
Simple Path:
A path where all where all its edges and vertices are distinct
76.
Graph Terminology U Y X W V Z a c b d e f g h i j We can see that P1 is a simple path. P1 = {U, a, V, b, X, h, Z} P1
77.
Graph Terminology U Y X W V Z a c b d e f g h i j P2 is not a simple path as not all its edges and vertices are distinct. P2 = {U, c, W, e, X, g, Y, f, W, d, V}
86.
Depth First Search Algorithim DFS() Input graph G Output labeling of the edges of G as discovery edges and back edges for all u in G.vertices() setLabel(u, Unexplored) for all e in G.incidentEdges() setLabel(e, Unexplored) for all v in G.vertices() if getLabel(v) = Unexplored DFS(G, v).
87.
Algorithm DFS(G, v) Input graph G and a start vertex v of G Output labeling of the edges of G as discovery edges and back edges setLabel(v, Visited) for all e in G.incidentEdges(v) if getLabel(e) = Unexplored w <--- opposite(v, e) if getLabel(w) = Unexplored setLabel(e, Discovery) DFS(G, w) else setLabel(e, BackEdge)
88.
Depth First Search A A Unexplored Vertex Visited Vertex Unexplored Edge Discovery Edge Back Edge
111.
Breadth First Search Algorithm BFS(G) Input graph G Output labeling of the edges and a partitioning of the vertices of G for all u in G.vertices() setLabel(u, Unexplored) for all e in G.edges() setLabel(e, Unexplored) for all v in G.vertices() if getLabel(v) = Unexplored BFS(G, v)
112.
Algorithm BFS(G, v) L 0 <-- new empty list L0.insertLast (v) setLabel(v, Visited) i <-- 0 while( ¬L i.isEmpty()) L i+1 <-- new empty list for all v in G.vertices(v) for all e in G.incidentEdges(v) if getLabel(e) = Unexplored w <-- opposite(v) if getLabel(w) = Unexplored setLabel(e, Discovery) setLabel(w, Visited) Li+1.insertLast (w) else setLabel(e, Cross) i <-- i + 1
Given a weighted graph G and two vertices 'u' and 'v' of G, we require that we find a path between 'u' and 'v' that has a minimum total weight between 'u' and 'v' also known as a ( shortest path ).
The length of a path is the sum of the weights of the paths edges.
149.
A(0) D C B F E 8 4 2 1 7 5 9 3 2 Add starting vertex to cloud.
150.
A(0) D C B F E 8 4 2 1 7 5 9 3 2 We store with each vertex v a label d(v) representing the distance of v from s in the subgraph consisting of the cloud and its adjacent vertices.
151.
A(0) D(4) C(2) B(8) F E 8 4 2 1 7 5 9 3 2 We store with each vertex v a label d(v) representing the distance of v from s in the subgraph consisting of the cloud and its adjacent vertices.
152.
A(0) D(4) C(2) B(8) F E 8 4 2 1 7 5 9 3 2 At each step we add to the cloud the vertex outside the cloud with the smallest distance label d(v).
153.
A(0) D(3) C(2) B(8) F(11) E(5) 8 4 2 1 7 5 9 3 2 We update the vertices adjacent to v. d(v) = min{d(z), d(v) + weight(e)}
154.
A(0) D(3) C(2) B(8) F(11) E(5) 8 4 2 1 7 5 9 3 2 At each step we add to the cloud the vertex outside the cloud with the smallest distance label d(v).
155.
A(0) D(3) C(2) B(8) F(8) E(5) 8 4 2 1 7 5 9 3 2 We update the vertices adjacent to v. d(v) = min{d(z), d(v) + weight(e)}.
156.
A(0) D(3) C(2) B(8) F(8) E(5) 8 4 2 1 7 5 9 3 2 At each step we add to the cloud the vertex outside the cloud with the smallest distance label d(v).
157.
A(0) D(3) C(2) B(7) F(8) E(5) 8 4 2 1 7 5 9 3 2 We update the vertices adjacent to v. d(v) = min{d(z), d(v) + weight(e)}.
158.
A(0) D(3) C(2) B(7) F(8) E(5) 8 4 2 1 7 5 9 3 2 At each step we add to the cloud the vertex outside the cloud with the smallest distance label d(v).
159.
A(0) D(3) C(2) B(7) F(8) E(5) 8 4 2 1 7 5 9 3 2 At each step we add to the cloud the vertex outside the cloud with the smallest distance label d(v).
Views
Actions
Embeds 0
Report content