Data structures

966 views
867 views

Published on

this slide some of the important data structures like graphs, trie, suffix trees, hash tables etc

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
966
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
41
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Data structures

  1. 1. Why we need Data Structures?•Efficient and Intuitive representation of data •Tree using arrays vs tree using pointers•To solve real life problems efficiently •Insertion •Deletion •Search •Sort•Applications •Social networks •Employee hierarchy •Recommended items
  2. 2. Basic Operations1.traverse2.insert3.delete4.find
  3. 3. Data Structures (Basic)•Arrays•Linked Lists•Stacks•Queues•Recursion•Trees – Basic•Practice Problems
  4. 4. Arrays
  5. 5. •Contiguous and fixed memory allocation (independent oflanguage)•Random access and modification•List of (index, value); index is non-negative integer; all values ina given array are of the same data type•To hold various types of values or have non-numerical indices,use associative arrays/dictionaries – The Dictionary Problem?
  6. 6. •Arrays may also be: •2-D : array of 1-D arrays (a 1-D array is a data type in itself) •3-D : array of 2-D arrays (a 2-D array is a data type in itself)•Memory placement of multi-dimensional arrays 1.row-major 2.column-major•Useful Operation a.Modify b.Access c.Swap d.In-place reverse
  7. 7. Structure of an Arraytemplate<class T> class Array{int size;T *arr;void put();void get();…….};Useful Libraries#include <vector>
  8. 8. Irregular Arrays•Languages known to students at IITG1.2-D ArrayLanguages Student1.Irregular Array Student
  9. 9. Special (Arrays ??)•Diagonal matrix, upper/lower triangular matrix, trigonal matrix,symmetric/asymmetric matrices•Generally deal with 2-D matrices, but 3-D or higher cases arealso possible. Generally deal with square matrix, but rectangular(non-square) are also possible•More like functions
  10. 10. Special (Arrays ??)int spec_matrix(int i, int j){ return no_cols*i + j + 1;}•Performance ??
  11. 11. One Dimensional Sparse Array 0 1 2 3 4 5 6 7 8 9 10 11ary 0 0 0 0 17 0 0 23 14 0 0 0 ary 4 17 7 23 8 14
  12. 12. Two Dimensional Sparse Array 0 1 2 3 4 5 0 12 0 5 12 1 1 2 2 3 8 33 3 1 8 5 33 4 17 4 3 17 5 5 Row elements can be accessed efficiently
  13. 13. Two Dimensional Sparse Array Efficient row and column elements access 0 1 2 3 4 5 0 1 2 3 4 5 cols0 12 rows1 0 0 5 122 13 8 33 24 17 3 3 1 8 3 5 335 4 4 3 17 5
  14. 14. Efficient Representation 0 1 2 3 4 5 0 12 1 2 cols 1 3 5 3 8 33 4 17 rows 50 0 5 123 3 1 8 3 5 334 4 3 17
  15. 15. Linked Lists
  16. 16. Why?•To store heterogeneous data•To store sparse data•Flexibility of increase/decrease in size; easy insertion anddeletion of elements•Useful Operations •insertion •deletion
  17. 17. The Structuretemplate <class T> class node{ T data; node<T> *next; // Extra (4?) bytes; size of a pointer};template <class T> LinkedList{ node<T> *head; int size; // …..etc etc etc};Useful Libraries#include <list>
  18. 18. Insertion/DeletionTime Complexity:Insertion : O(1) / O(n)Deletion : O(1) / O(n)Space Complexity: Insertion : O(1) Deletion : O(1)
  19. 19. Tweak some more !•Doubly Linked Lists •Extra (4?) bytes space vs better accessibility •Insertion/deletion ?•Circular Linked Lists •How to find the end? •Tail pointer •Null ‘next’ pointer from last node •Last node points to first (circular)
  20. 20. Practice (Linked List) Linked List 1) Linked List 2)
  21. 21. Recursion
  22. 22. •To solve a task using that task itself •; a task should have recursive nature •; generally can be transformed by tweaking some parts of the task•Example: task of piling up n coins vs picking up asuitcase.•Let the task be a C function. What are the parts of thetask: 1.Input it takes 2.What it does 3.Output it gives
  23. 23. •A task is performed recursively when generally a large inputcan’t be handled directly.•So, recursion is all about simplifying the input at every step till itbecomes trivial (base case)
  24. 24. Implementation – run time stack•Activation Records (AR) •Store the state of a method 1.input parameters 2.return values 3.local variables 4.return addresses
  25. 25. Advantages/Disadvantages1.more readable/understandable/consistent with the thedefinition1.memory requirements increase due to runtime stack2.difficult to open and debug
  26. 26. Types of Recursion•Tail (vs loop?)int factn;While (n > 0) factn *= n--;•Indirect •A() -> B() -> C() -> A()•Nested: •h(n) = h(2 + h(n-1))
  27. 27. Hashes
  28. 28. Why?•Want to store dictionaries?, associative arrays? •arrays with non-numerical indices•String operations made easy •Ex: Finding anagrams •Ex: Counting frequency of words in a string
  29. 29. Associative Arrays•(key, value) pairs where key is not necessarily a non-negativeinteger; can be string etc.•Ex: no. of students in each department •“cse” => 68 •“eee” => 120 •“mech” => 70 •“biotech” => 30•Do not allow duplicate keys •Dict (“cse”) = “data structures” •Dict (“cse”) = “algorithms” Dict(“cse”) = {“data structures”, “algorithms”}
  30. 30. Hash Functions1.HashTable : an array of fixed size •TableSize - preferably prime and large2.Hash function (map to an index of the HashTable)Techniques •use all characters •use aggregate properties - length, frequencies of characters •first 3 characters, odd charactersEvaluation •Uniform distribution; load factor λ? •Utilize table space •Quickly computable
  31. 31. 3. Collision resolution 1.separate chaining •Linked list at each index •Insertion (at head?) •Desired length of a chain : close to λ •Avg. time for Successful search = 1 + 1 + λ/2 •Disadvantages •slow? •different data structures - array/linked lists?
  32. 32. 1.open addressing •Single table •Desired λ ~ 0.5 •Apply h0(x), h1(x), h2(x) … • hi(x) = h(x) + f(i); f(0) = 03 ways to do it 1.linear probing : f(i) is linear in i •f(i) = i (quickly computable vs primary clustering?) 2.quadratic probing : f(i) is quadratic in i 3.double hashing •H(x) = h(x) + f(i).h2(x)Rehashing •What if the table gets full (70%, …. , 100%) •Create a new HashTable double? the size
  33. 33. Structuretemplate<class T> class Hash{ int TableSize; T *arr;};Useful Libraries#include <hash>
  34. 34. Practice (Hashes) Trie 7)
  35. 35. Graphs
  36. 36. Representation1.Adjacency Matrix (|V| * |V|)1.Adjacency List
  37. 37. Breadth First Traversal (BFT)•Traverse the nodes depth-wise; nodes at depth 0 before nodesat depth 1 before nodes at depth 2 ....•Done using a queue•Ex: 1,2,3,4,5,7,8,6
  38. 38. Depth First Traversal (DFT)•Move to next child only after all nodes in the current child aremarked•Done using a stack•Ex: a, b, c, d, e, h, f, g
  39. 39. Trees (Advanced)
  40. 40. Retrieval•Stores the prefixes of a set of strings in an efficient manner•Used to store associative arrays/dictionaries
  41. 41. How to create a Trie•Ex: tin, ten, ted, tea, to, i, in, inn
  42. 42. Pairs of anagrams•Sort all the strings •acute -> acetu •obtuse -> beostu … etc•Insert them into the trie•Keep storing collisions i.e. multiple values for each key•Each set of values gives groups of anagrams
  43. 43. Suffix Tree/Patricia/Radix Tree•Stores the suffixes of a string•O(n) space and time to build•Does not exist for all strings; add special symbol $ at the end
  44. 44. Advantages of Suffix Trees•Store n suffixes in O(n) space.•Improved string operations. Eg. substring lookup, Longestcommon substring operation (generalized suffix trees?)Generalized Suffix Trees•Each string terminated by a different special symbol•More space efficient•Have different set of algorithms
  45. 45. Longest Common SubstringLongest Common Substring1.Make a “generalized suffix tree” for the (2?) strings2.Traverse the tree to mark all internal nodes as 1, 2 or (1,2)depending on whether it is parent to a leaf node terminatingwith the special symbol of string 1 and string 2.3.Find the deepest internal node marked 1,2Pattern Matching ?

×