Hash Table
  Code Review 1/20/11




                        S
Outline


S Hash Table Overview

S Hashing Overview

S Add Items

S Remove Items

S Search For Items

S Enumerate Items
Hash Table Overview


S Associative Array
  S Storage of Key / Value Pairs
  S Like an array, but the index can be any comparable type

S Each Key is Unique, though Keys can point to the same
  value

S The Key Type is mapped to an Index
Hashed???

S Hashing derives a fixed size result from an input
  S every hash returns same size and type

S Stable
  S The same input generates the same output ALWAYS

S Uniform
  S The hash value use should be uniformly distributed through available
      space (though impossible to have perfect uniformity)

S Efficient
  S The cost of generating a hash must be balanced with application needs

S Secure
  S The cost of finding data that produces a given hash is prohibitive
Hashing A String

    S NaĂŻve implementation
      S Summing the ASCII value for each character

F        O          O                    102       111    111   324


    S Pros
      S Stable
      S Efficient

    S Cons
      S Not Uniform
         S   AdditiveHash(“foo”) = AdditiveHash(“oof ”)
       S Not Secure
Hashing A String

S Somewhat better
  S “Folds” bytes of every four characters into an integer (32bit)
   Lore          m ip          sum            dolo         r
   170199844     1885937773    54404403       1869377380   114
                 -707031079    -162986676     1706390704   1706390818

S Pros
  S Stable, Efficient and better uniformity

S Cons
  S Not secure (can be treated essentially as additive)
Hashing Functions


S There are lots of good hashing algorithms, you don’t have to write
   your own.
S Pick the right hash for the job at hand (all these are available in the
   .net framework)

       Name        Stable      Uniform     Efficient   Secure
       Additive    âś”                       âś”
       Folding     âś”           âś”           âś”
       CRC32       âś”           âś”           âś”
       MD5         âś”           âś”
       SHA-2       âś”           âś”                       âś”
Hash Table Overview


S Adding Jane
  S int index = GetIndex(Jane.Name);
  S _array[index] = Jane;

S What does GetIndex() do? It hashes the string
Simple Hash Examples
Handling Collisions


S Two distinct items have the same hash value
  S Items are assigned to the same index in the hash table

S Two common strategies
  S Open Addressing
       S   Moving to next index in the table
   S   Chaining
       S   Storing items into a linked list

S Frequency of Collisions
  S # of slots in the array
  S # of filled slots in the array
Finding Items

S Items are found by key
  S Person p = HashTable.Find(“Jane”)

S Open Addressing
  S Get the index of the key
  S If the value != null
     S   If keys match, return value
     S   If keys don’t match, check next index

S Chaining
  S Get index of the key
  S Find index in the list
Removing Items

S Items are removed by key
  S HashTable.Remove(“Jane”)

S Open Addressing
  S Get index of the key
  S If value != null
      S   If keys match, remove
      S   If keys don’t match, check next index

S Chaining
  S Get index of the key
  S Remove item from the linked list
Enumerating Keys & Values


S Open Addressing
  S Foreach(item in array)
      {if(item!=null) return item;}

S Chaining
  S Foreach(list in array)
      {if(list !=null)
       {foreach(item in list){return item;}}
      }

Hash tables

  • 1.
    Hash Table Code Review 1/20/11 S
  • 2.
    Outline S Hash TableOverview S Hashing Overview S Add Items S Remove Items S Search For Items S Enumerate Items
  • 3.
    Hash Table Overview SAssociative Array S Storage of Key / Value Pairs S Like an array, but the index can be any comparable type S Each Key is Unique, though Keys can point to the same value S The Key Type is mapped to an Index
  • 4.
    Hashed??? S Hashing derivesa fixed size result from an input S every hash returns same size and type S Stable S The same input generates the same output ALWAYS S Uniform S The hash value use should be uniformly distributed through available space (though impossible to have perfect uniformity) S Efficient S The cost of generating a hash must be balanced with application needs S Secure S The cost of finding data that produces a given hash is prohibitive
  • 5.
    Hashing A String S Naïve implementation S Summing the ASCII value for each character F O O 102 111 111 324 S Pros S Stable S Efficient S Cons S Not Uniform S AdditiveHash(“foo”) = AdditiveHash(“oof ”) S Not Secure
  • 6.
    Hashing A String SSomewhat better S “Folds” bytes of every four characters into an integer (32bit) Lore m ip sum dolo r 170199844 1885937773 54404403 1869377380 114 -707031079 -162986676 1706390704 1706390818 S Pros S Stable, Efficient and better uniformity S Cons S Not secure (can be treated essentially as additive)
  • 7.
    Hashing Functions S Thereare lots of good hashing algorithms, you don’t have to write your own. S Pick the right hash for the job at hand (all these are available in the .net framework) Name Stable Uniform Efficient Secure Additive ✔ ✔ Folding ✔ ✔ ✔ CRC32 ✔ ✔ ✔ MD5 ✔ ✔ SHA-2 ✔ ✔ ✔
  • 8.
    Hash Table Overview SAdding Jane S int index = GetIndex(Jane.Name); S _array[index] = Jane; S What does GetIndex() do? It hashes the string
  • 9.
  • 10.
    Handling Collisions S Twodistinct items have the same hash value S Items are assigned to the same index in the hash table S Two common strategies S Open Addressing S Moving to next index in the table S Chaining S Storing items into a linked list S Frequency of Collisions S # of slots in the array S # of filled slots in the array
  • 11.
    Finding Items S Itemsare found by key S Person p = HashTable.Find(“Jane”) S Open Addressing S Get the index of the key S If the value != null S If keys match, return value S If keys don’t match, check next index S Chaining S Get index of the key S Find index in the list
  • 12.
    Removing Items S Itemsare removed by key S HashTable.Remove(“Jane”) S Open Addressing S Get index of the key S If value != null S If keys match, remove S If keys don’t match, check next index S Chaining S Get index of the key S Remove item from the linked list
  • 13.
    Enumerating Keys &Values S Open Addressing S Foreach(item in array) {if(item!=null) return item;} S Chaining S Foreach(list in array) {if(list !=null) {foreach(item in list){return item;}} }