Open addressing hashing is an alternative to
resolving collisions with linked list.
Separate chaining hashing has the
disadvantage of using linked lists.
The algorithm down a bit because of the time
to allocate new cells.
Its essentially requires the implements of a
second data structure.
Cells h0(x),h1(x),h2(x)... N.
F(0)=0 the function f, is the collision
resolution strategy.
The load factor ℷ=0.5
hi(x)=(hash(x)+f(i))
 The amounts to trying cells sequentially in
search of empty cell.
 The result of inserting keys {89,18,49,58,69}
into a hash table.
 The collision resolution strategy ,f(i)=i.
 The first collision occurs when 49 is inserted;
in spot 0,which is open.
Unsuccessful search ½(1+1/(1-ℷ)2)
Successful search ½(1+1/(1-ℷ))
0
1
2
3
4
5
6
7
8
9
 Linear Probing: after
checking spot h(k), try
spot h(k)+1, if that is full,
try h(k)+2, then h(k)+3,
etc.
Insert:
38
19
8
109
10
Empty table After 89 After 18 After 49 After 58 After 69
0 49 49 49
1 58 58
2 69
3
4
5
6
7
8 18 18 18 18
9 89 89 89 89 89
f(i) = i2
 Probe sequence:
0th probe = h(k) mod Table size
1th probe = (h(k) + 1) mod Table size
2th probe = (h(k) + 4) mod Table Size
3th probe = (h(k) + 9) mod Table Size
. . .
ith probe = (h(k) + i2) mod Table Size
 show for all 0  i,j  size/2 and i  j
(h(x) + i2) mod size  (h(x) + j2) mod size
 by contradiction: suppose that for some i  j:
(h(x) + i2) mod size = (h(x) + j2) mod size
 i2 mod size = j2 mod size
 (i2 - j2) mod size = 0
 [(i + j)(i - j)] mod size = 0
Because size is prime(i-j)or (i+j) must be zero, and
neither can be
Empty table After 89 After 18 After 49 After 58 After 69
0 49 49 49
1
2 58 58
3 69
4
5
6
7
8 18 18 18 18
9 89 89 89 89 89
The last collision resolution method
examine is double hashing.
Double hashing f(i)=i⋅hash2(x).
Hash function to x and probe at a
distance hash2(x),2hash2(x)…,
A function such as hash2(x)=R-(x mod
R), with R a prime smaller than Table
Size.
f(i) = i * g(k)
where g is a second hash function
Probe sequence:
0th probe = h(k) mod Table Size
1th probe = (h(k) + g(k)) mod Table Size
2th probe = (h(k) + 2*g(k)) mod Table Size
3th probe = (h(k) + 3*g(k)) mod Table Size
. . .
ith probe = (h(k) + i*g(k)) mod Table Size
 Insert these values into the hash table in this
order. Resolve any collisions with double hashing:
13
28
33
147
43
Hash Functions:
H(K) = K mod M
H2(K) = 1 + ((K/M) mod (M-1))
 When the table gets too full, create a bigger
table (usually 2x as large) and hash all the
items from the original table into the new
table.
To rehash
1) half full ( = 0.5)
2) when an insertion fails
3) some other threshold
0 6
1 15
2
3 24
4
5
6 13
0 6
1 15
2 23
3 24
4
5
6 13
B) Open addressing hash
table with linear probing
after 23 is inserted
A) Open addressing hash
table with linear probing
with input 13,15,6,24
0
1
2
3 6
4 23
5 24
6
7
8
9 13
10
11 15
 Extendible hashing accesses the data stored in
buckets indirectly through an index that is
dynamically adjusted to reflect changes in the file.
 A hash function applied to a certain key indicates
a position in the index and not in the file (or table
or keys). Values returned by such a hash function
are called pseudo keys.
00 01 10 11
(2)
000 100
001 010
001 010
001 011
(2)
010 100
011 000
(2)
100 000
101 000
101 100
101 110
(2)
111 000
111 001
000 001 010 011 100 101 110 111
(2)
000 100
001 000
001 010
001 011
(2)
010 100
011 000
(2)
100 000
100 100
(3)
100 000
100 100
(2)
111 000
111 001
000 001 010 011 100 101 110 111
(3)
000 000
000 100
(3)
001 000
001 010
001 110
(2)
010 100
011 000
(3)
100 000
100 100
(2)
101 000
101 100
101 110
(2)
111 000
111 001
Expandable Hashing
But binary tree is used to store an index on the buckets.
Dynamic Hashing
multiple binary trees are used.
Outcome:
- To shorten the search.
- Based on the key --- select what tree to search.
 Larson method
 Index is simplified to be represented as a set of
binary trees.
 Height of each tree is limited.
 h(x) is searched in ALL trees.
 Time: m – trees, k keys in each max, overall:
m*l gk.
 Advantage: shorter search time in index file

Open addressiing &rehashing,extendiblevhashing

  • 2.
    Open addressing hashingis an alternative to resolving collisions with linked list. Separate chaining hashing has the disadvantage of using linked lists. The algorithm down a bit because of the time to allocate new cells. Its essentially requires the implements of a second data structure.
  • 3.
    Cells h0(x),h1(x),h2(x)... N. F(0)=0the function f, is the collision resolution strategy. The load factor ℷ=0.5 hi(x)=(hash(x)+f(i))
  • 4.
     The amountsto trying cells sequentially in search of empty cell.  The result of inserting keys {89,18,49,58,69} into a hash table.  The collision resolution strategy ,f(i)=i.  The first collision occurs when 49 is inserted; in spot 0,which is open. Unsuccessful search ½(1+1/(1-ℷ)2) Successful search ½(1+1/(1-ℷ))
  • 5.
    0 1 2 3 4 5 6 7 8 9  Linear Probing:after checking spot h(k), try spot h(k)+1, if that is full, try h(k)+2, then h(k)+3, etc. Insert: 38 19 8 109 10
  • 6.
    Empty table After89 After 18 After 49 After 58 After 69 0 49 49 49 1 58 58 2 69 3 4 5 6 7 8 18 18 18 18 9 89 89 89 89 89
  • 7.
    f(i) = i2 Probe sequence: 0th probe = h(k) mod Table size 1th probe = (h(k) + 1) mod Table size 2th probe = (h(k) + 4) mod Table Size 3th probe = (h(k) + 9) mod Table Size . . . ith probe = (h(k) + i2) mod Table Size
  • 8.
     show forall 0  i,j  size/2 and i  j (h(x) + i2) mod size  (h(x) + j2) mod size  by contradiction: suppose that for some i  j: (h(x) + i2) mod size = (h(x) + j2) mod size  i2 mod size = j2 mod size  (i2 - j2) mod size = 0  [(i + j)(i - j)] mod size = 0 Because size is prime(i-j)or (i+j) must be zero, and neither can be
  • 9.
    Empty table After89 After 18 After 49 After 58 After 69 0 49 49 49 1 2 58 58 3 69 4 5 6 7 8 18 18 18 18 9 89 89 89 89 89
  • 10.
    The last collisionresolution method examine is double hashing. Double hashing f(i)=i⋅hash2(x). Hash function to x and probe at a distance hash2(x),2hash2(x)…, A function such as hash2(x)=R-(x mod R), with R a prime smaller than Table Size.
  • 11.
    f(i) = i* g(k) where g is a second hash function Probe sequence: 0th probe = h(k) mod Table Size 1th probe = (h(k) + g(k)) mod Table Size 2th probe = (h(k) + 2*g(k)) mod Table Size 3th probe = (h(k) + 3*g(k)) mod Table Size . . . ith probe = (h(k) + i*g(k)) mod Table Size
  • 12.
     Insert thesevalues into the hash table in this order. Resolve any collisions with double hashing: 13 28 33 147 43 Hash Functions: H(K) = K mod M H2(K) = 1 + ((K/M) mod (M-1))
  • 13.
     When thetable gets too full, create a bigger table (usually 2x as large) and hash all the items from the original table into the new table. To rehash 1) half full ( = 0.5) 2) when an insertion fails 3) some other threshold
  • 14.
    0 6 1 15 2 324 4 5 6 13 0 6 1 15 2 23 3 24 4 5 6 13 B) Open addressing hash table with linear probing after 23 is inserted A) Open addressing hash table with linear probing with input 13,15,6,24
  • 15.
    0 1 2 3 6 4 23 524 6 7 8 9 13 10 11 15
  • 16.
     Extendible hashingaccesses the data stored in buckets indirectly through an index that is dynamically adjusted to reflect changes in the file.  A hash function applied to a certain key indicates a position in the index and not in the file (or table or keys). Values returned by such a hash function are called pseudo keys.
  • 17.
    00 01 1011 (2) 000 100 001 010 001 010 001 011 (2) 010 100 011 000 (2) 100 000 101 000 101 100 101 110 (2) 111 000 111 001
  • 18.
    000 001 010011 100 101 110 111 (2) 000 100 001 000 001 010 001 011 (2) 010 100 011 000 (2) 100 000 100 100 (3) 100 000 100 100 (2) 111 000 111 001
  • 19.
    000 001 010011 100 101 110 111 (3) 000 000 000 100 (3) 001 000 001 010 001 110 (2) 010 100 011 000 (3) 100 000 100 100 (2) 101 000 101 100 101 110 (2) 111 000 111 001
  • 20.
    Expandable Hashing But binarytree is used to store an index on the buckets. Dynamic Hashing multiple binary trees are used. Outcome: - To shorten the search. - Based on the key --- select what tree to search.
  • 21.
     Larson method Index is simplified to be represented as a set of binary trees.  Height of each tree is limited.  h(x) is searched in ALL trees.  Time: m – trees, k keys in each max, overall: m*l gk.  Advantage: shorter search time in index file