Search techniques and Hashing

Searching
Searching
1. Sequential
2. Binary
3. Interpolation
4. Hash table

Sequential Search
 Looks for the target from the first to the last
element of the list
 The later in the list the target occurs the longer
it takes to find it
 Does not assume anything about the order of the
elements in the list, so it can be used with an
unsorted list

Sequential Search Algorithm
for i = 1 to N do
if (target == list[i])
return i
end if
end for
return 0

Worst-Case Analysis
 If the target is in the last location, we look at all
of the elements to find it
 If the target is not in the list, we need to look at
all of the elements to learn that
 Therefore, the largest number of comparisons
we will do in this algorithm is N

Average-Case Analysis
 If the search is always successful, there are N
places the target could be found
 It will take 1 comparison to find the target in the
first location, 2 comparisons to find the target in
the second location, and so on
 If each location is equally likely, we get:
2
11
)(A
1



N
i
N
N
N
i

 If the search can fail, there are N places the target
could be found and 1 possibility when it’s not
found
 If the target is not found, we do N comparisons
 If each of these N+1 possibilities are equally
likely, we get:
2
2
1
1
)(A
1














N
iN
N
N
N
i

Binary Search
 Used with a sorted list
 First check the middle list element
 If the target matches the middle element, we are
done
 If the target is less than the middle element, the
key must be in the first half
 If the target is larger than the middle element,
the key must be in the second half

Algorithm Review
 Each comparison eliminates about half of the
elements of the list from consideration
 If we begin with N = 2k
– 1 elements in the list,
there will be 2k–1
– 1 elements on the second
pass, and 2k–2
– 1 elements on the third pass

Worst-Case Analysis
 In the worst case, we will either find the target
on the last pass, or not find the target at all
 The last pass will have only one element left to
compare, which happens when
21
-1 = 1
 If N = 2k
– 1, then there must be
k = lg(N+1) passes

 If the search is always successful, there are N
places the target could be found
 There is one place we check on the first pass,
two places we could check on the second pass,
and four places we could check on the third pass

 We can represent binary search as a binary tree:

 In looking at the binary tree, we see that there
are i comparisons needed to find the 2i–1
elements on level i of the tree
 For a list with N = 2k
-1 elements, there are k
levels in the binary tree
 These two facts give us:
1)1lg(2*
1
)(A
1
1 

 Ni
N
N
k
i
i

 If the search can fail sometimes, there are N
places the target could be found and N+1
possibilities when it is not found
 In other words, if the missing key were added to
the list, it could be put at the beginning, between
any two elements, or at the end – a total of N+1
different places

 The possibilities when the key is found are still
the same as before, and the new cases all take k
comparisons when N = 2k
– 1
 This gives us:
 
2
1
)1lg(
2**1
12
1
)(A
1
1












N
ikN
N
N
k
i
i

Any Alternative to Binary Search?
 Have we used all the knowledge we have about
finding an item in an ordered array? The answer is
maybe not.
 If you were looking for Mr. Alfred Aaron in the
telephone book, would you open the book in the
middle and see whether Aaron was in the first half
or second half of the book? I think not.

Any Alternative to Binary Search?
 Given the additional information of the upper and
lower limits of the values in a list we can
improve on a binary search by estimating the
most likely position of an element in the list.
 This is called an interpolation search.

Interpolation Search
It proceeds like a binary search only the list
is divided each time according to our
estimate of where the key is situated.
Given a uniform distribution of keys,
interpolation search has an average case time
complexity of only lg(lg n).

 There is another type of information we
normally use when searching a phone book
which is not used by binary search but it is used
by interpolation search:
where would you open the phone book if
you where looking for Mr. Alfred Aaron?

 If the following conditions are true then interpolation
search may be better than binary search:
 Each access is very expensive compared to a typical instruction,
e.g. the array is stored on a disk and each comparison requires a
disk access.
 The data are not only sorted but also fairly uniformly
distributed, e.g. a phone book is fairly uniformly distributed, an
input like: [1,2,3,4,5,6,7,8,16,32,355,...] is not.

In this situation we are willing to
spend more time to make an accurate
guess where the item may be (instead
of always picking the mid point):

 For example:
 Array of 1000 items
 The lowest item in the range is 1000
 The highest item in range is 1,000,000
 We are looking for the item of value 12,000
 Then we expect to find the item around the 12th
position (always in the assumption that the items are
uniformly distributed). This is expressed by the
formula:

 
2
lastfirst
mid


   1
35
7*5
540
075100



mid
  3
2
70


mid
n = 8
k = 10
A: 10 15 20 25 30 35 405
0 7
   
   firstAlastA
firstlastfirstAkfirst
mid



first lastmid
?
Binary Search :
Interpolation Search:
Data Structures

 Calculation is more costly than the binary search
calculation
 It needs to be done using floating point operations.
 One iteration may be slower than the complete binary
search.
 If the cost of this calculation is insignificant to the cost of
accessing an item, we only care about the number of
iterations.

 In the worst case, when the numbers are not uniformly
distributed, the running time could be linear and all the
items might be examined.
 If the items are reasonably uniformly distributed, the
running time has been demonstrated to be O(log log N)
(apply the logarithm twice in succession).
 For example, for N = 4billion, log N is about 32 and
loglog N is roughly 5.

Hashing
 Hashing
 Hash functions
 Hash Tables
 STL’s hash_map

 Hash tables are a common approach to the
storing/searching problem.
Hash Tables

What is a Hash Table ?
 The simplest kind of hash table
is an array of records.
 This example has 701 records.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
An array of records
. . .
[ 700]

 Each record has a special
field, called its key.
 In this example, the key is a
long integer field called
Number.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
. . .
[ 700]
[ 4 ]
Number 506643548

 The number might be a
person's identification
number, and the rest of the
record has information about
the person.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
. . .
[ 700]
[ 4 ]
Number 506643548

 When a hash table is in use,
some spots contain valid
records, and other spots are
"empty".
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .

Inserting a New Record
 In order to insert a new record,
the key must somehow be
converted to an array index.
 The index is called the hash
value of the key.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 155778322
. . .
Number 580625685

 Typical way create a hash
value:
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 155778322
. . .
Number 580625685
(Number mod 701)
What is (580625685 mod 701) ?

 Typical way to create a hash
value:
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 155778322
. . .
Number 580625685
(Number mod 701)
What is (580625685 mod 701) ?
3

 The hash value is used for the
location of the new record.
Number 580625685
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 155778322
. . .
[3]

 The hash value is used for the
location of the new record.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 155778322
. . .
Number 580625685

Collisions
 Here is another new record to
insert, with a hash value of 2.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 155778322
. . .
Number 580625685
Number 701466868
My hash
value is [2].

Collisions
 This is called a collision,
because there is already
another valid record at [2].
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 155778322
. . .
Number 580625685
Number 701466868
When a collision occurs,
move forward until you
find an empty spot.

Collisions
 This is called a collision,
because there is already
another valid record at [2].
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 155778322
. . .
Number 580625685 Number 701466868
The new record goes
in the empty spot.

A Quiz
Where would you be placed in
this table, if there is no
collision? Use your social
security number or some other
favorite number.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 155778322Number 580625685 Number 701466868
. . .

Searching for a Key
 The data that's attached to a
key can be found fairly
quickly.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 155778322
. . .
Number 580625685 Number 701466868
Number 701466868

Searching for a Key
 Calculate the hash value.
 Check that location of the array for
the key.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 155778322
. . .
Number 580625685 Number 701466868
Number 701466868
My hash
value is [2].
Not me.

Searching for a Key
 Keep moving forward until you
find the key, or you reach an
empty spot.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 155778322
. . .
Number 580625685 Number 701466868
Number 701466868
My hash
value is [2].
Not me.

Searching for a Key
 Keep moving forward until you
find the key, or you reach an
empty spot.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 155778322
. . .
Number 580625685 Number 701466868
Number 701466868
My hash
value is [2].
Yes!

Searching for a Key
 When the item is found, the
information can be copied to the
necessary location.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 155778322
. . .
Number 580625685 Number 701466868
Number 701466868
My hash
value is [2].
Yes!

Deleting a Record
 Records may also be deleted from a hash table.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 155778322
. . .
Number 580625685 Number 701466868
Please
delete me.

Deleting a Record
 But the location must not be left as an ordinary "empty
spot" since that could interfere with searches.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868

Deleting a Record
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
 But the location must not be left as an ordinary "empty
spot" since that could interfere with searches.
 The location must be marked in some special way so that a
search can tell that the spot used to have something in it.

Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Hash Table
In the previous studies, all the searches had an
efficiency of at least O(logn)
Can it be faster?
 For example, if a primary key contains values from 0 to
99, then a table (array) of size 100 would be enough for
each record to be directly located by the key value
which is the subscript of the table
 If we can match all key values to different slots of a
table, we can make searching for a record very efficient
  Hash Table: ideally to support search time O(1)

Hash function and hash key
 Key values may not be numeric or may be very large, but
we may transform the key into a value within a range
 E.g., suppose that there are at most m (10000) records in
the file. Even if the key is in 8 digits, we may use a
function, e.g., key / 10000 to transform keys with 8 digits
to a value from 0-9999
 Such a function which transforms a key into a value
which may further transform to a subscript of an array,
in a fixed length, is called hash function
 The key being transformed is called the hash of key

Perfect hash function
An ideal (perfect) hash
function transforms all
different hash of keys into
different subscripts of a table
When a file has a million
records, it is difficult to have
such a function

Hash Value and Hash Table index
A hash function transforms a key to a value which is
called hash value
This value may need to further be transformed to a
subscript of an array: hashValue%m where m is the
table size
The value which can map to a subscript of an array is
called hash table index

Hash collision (clash)
When two hash of keys have the same hashed values,
it is called a hash collision or a hash clash
E.g., given a hash function h(key) = key and the hash
table size 1000, ==> hash table size: hi(h(1322)) = 1322
% 1000 = hi(h(2322)) = 2322 % 1000 = 322
That means both key 1322 and 2322 may attempt to
insert the record into the same position

Resolving hash clashes
There are two basic techniques:
1. Chaining (Open hashing): Keys with the same hash
values will be linked together and a search process
should sequentially traverse all the items in the
linked list
2. Open Addressing (Closed Hashing) : Whenever
there is a clash, it will rehash – to find another slot
in the table
 many techniques: e.g., linear probing, quadratic probing

Chaining
Example: h(key) = key % 10
Input: 2822, 1615, 2813, 3553, 4288, 2125, 8232
0
1
2
3
4
5
6
7
8
9
2822
2813
1615
8232
3553
2125
4288

Open Addressing: Linear probing
Place the record in the next available position in the
array, i.e., rh(i) = i+1. E.g., (input: 2822, 1615, 2813, 3553,
4288, 2125, 8232)
0
1
2
3
4
5
6
7
8
9
2822
1615
2813
3553
4288
2125
8232
3553: h(3553)=3, rh(1)=4
2125: h(2125)=5, rh(1)=6
8232: h(8232)=2,
rh(1)=3,r(2)=4,
rh(3)=5, rh(4)=6, rh(5)=7

Open addressing -- quadratic rehash
the jth rehash is hj(key) = (h(key)+j2) % array_size
E.g., (input: 2822, 1615, 2813, 3553, 4288, 2125, 8232)
0
1
2
3
4
5
6
7
8
9
2822
1615
2813
3553
4288
2125
8232
3553: h(3553)=3, h1=3+1=4
2125: h(2125)=5, h1=5+1=6
8232: h(8232)=2, h1=2+1=3,
h2=2+4=6,
h3=(2+9)%10=1

Hash table re-sizing
When a hash table is full or nearly full, it requires
re-sizing to increase the size of the hash table
One of the methods is to take its first prime which is
twice as large as the old table size
For the previous table size 10  new table size is 23
and new hash function is h(key)=key%23
0 91 2 3 4 5 6 7 8 1
0
1
1
1
2
1
3
1
4
1
5
1
6
1
7
1
8
1
9
2
0
2
1
2
2
2
8
2
2
1
6
1
5
2
8
1
3
3
5
5
3
4
2
8
8
2
1
2
5
8
2
3
2

Load Factor
To determine if a hash table is full or
nearly full, load factor is used
The value of the load factor is the ratio
of number of elements (m) to the slots
(n) of the table: m/n

Acceptable ranges of load factor
For different addressing methods, the load factor
has different acceptable ranges
 Closed addressing (chaining): about 2 to 4 – if key
values are well distributed in the table, it is expected
that every linked list has one or more nodes than the
load factor, i.e., every hit may require at most 4 to 6
visits
 Open addressing: less than about 0.7 – it is the
percentage of slots being occupied – a larger percentage
may make a key to be rehashed many times – no more
O(1)

Exercises
Ford’s 12:15.a-b++
hf(x) = x, m=11, data: 1, 13, 12, 53, 77, 29, 31,22
 a) Construct the hash table by using linear probe addressing
 Construct the table again by using rehash function:
index = (index + 5) % 11
 b) Construct the hash table by using chaining with separate
lists; and also
 Determine the load factors of the tables.
 Depict the hash table after resize, the one resulting from
linear probing.

Hash Functions for integer data
A hash function usually produces a non-negative
value
A common hash function of numeric data is simply
hash(x) = abs(x)
Ford’s: hash(x) = x2 / 256 % 65536

Hash Functions for real numbers
Ford’s:
 hash(x) = 0 if x = 0; otherwise
 hashval = abs(2 * fabs(frexp(x,&exp)) -1);
where frexp() is a C library function which is used to
decompose num into two parts: a mantissa between 0.5 and
1 (returned by the function) and an exponent returned as
exp; and scientific notation works like this:
x = mantissa * (2 ^ exp)
(Reference: www.cppreference.com)
ICarnegie: hash(x) = floor(m * (frac(x * r)), where
typically, r can be the Golden Ratio (sqrt(5) – 1)/2 and
m is the table size

Hash functions for strings
It is quite easy to think about converting each
character to its ASCII value (65-90 and 97-122) and
then accumulate its sum as the hash values – all
permutations of a word hash to the same slot!
The value of a character at different positions
multiplies a factor then sums up the result – making a
string similar to a number
 when the factor is too small, it may not be significant
 when the factor is too large, the resulting value would
overflow – only the last few characters become
accountable!

Hash Table vs BST
 Timing for searching
 Ideally, hash table has the complexity of O(1) while BST has a
complexity of O(log n)
 However, it may require more than O(log n) if many keys are
clashed to the same slot. Even with the load factor, a hash table may
maintain an optimal time in searching but it takes very much time
when the hash table is required to re-size in order to maintain an
acceptable load factor
 Sequential scan and range scan
 The in-order traversal on a BST is a sequential scan, and range scan
is just a partial scan of the in-order traversal
 Hash table does not easily support sequential scan on key values
unless the hash function maintains the order of the key values – such
a hash function may not distribute very well different key values into
different slots

 Coalesced hashing is a collision resolution method that
uses pointers to connect the elements of a synonym
chain.
Coalesced Hashing
• A hybrid of separate chaining and open addressing.
• Linked lists within the hash table handle collisions.
• This strategy is effective, efficient and very easy to
implement.

 Coalesced hashing obtains its name from what occurs when we attempt
to insert a record with a home address that is already occupied by a record
from a chain with a different home address.
Coalesced Hashing
This situation would occur, for example, if we attempted to insert
a record with a home address of s into the hash table.
What occurs is that the two chains with records having different
home addresses coalesce or grow together.

 In figure to the right, the records with
keys X, D, and Y were inserted in the given
order into the hash table. A, B, C, and D
form one set of synonyms and X and Y form
another set.
 When X is inserted into the table with
coalescing, it must be inserted as the end of
the chain that it is coalescing with.
Instead of needing only one probe to retrieve
X, three are needed. The greater the
coalescing the longer he probe chain will be,
and as a result, retrieval performance will be
degraded.
 When record D is now added, it must be
inserted at the end of the coalesced chains;
we must move over record X from the other
chain then to locate D.
Coalesced Hashing
Synonym chain: with coalescing
(The shaded portion indicates portion
of the chain in which coalescing has
occurred, the thin line represents the
insertions on the synonym chain with r
as its home address. The thick line
represents the insertions on the chain
with s as its home address.)

Coalesced Hashing
Coalesced hashing originated with Williams [1] and is also
referred to as direct chaining.
Algorithm for Coalesced Hashing

Hash Tables

Hash Tables
Hash table:
 Given a table T and a record x, with key (= symbol) and
satellite data, we need to support:
• Insert (T, x)
• Delete (T, x)
• Search(T, x)
 We want these to be fast, but don’t care about sorting the
records
 In this discussion we consider all keys to be (possibly
large) natural numbers

Direct Addressing
Suppose:
 The range of keys is 0..m-1
 Keys are distinct
The idea:
 Set up an array T[0..m-1] in which
• T[i] = x if x T and key[x] = i
• T[i] = NULL otherwise
 This is called a direct-address table
• Operations take O(1) time!

The Problem With
Direct Addressing
Direct addressing works well when the range m of
keys is relatively small
But what if the keys are 32-bit integers?
 Problem 1: direct-address table will have
232 entries, more than 4 billion
 Problem 2: even if memory is not an issue, the time to
initialize the elements to NULL may be
Solution: map keys to smaller range 0..m-1
This mapping is called a hash function

Hash Functions
Next problem: collision
T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U
(universe of keys)
K
(actual
keys)

Resolving Collisions
How can we solve the problem of collisions?
Solution 1: chaining
Solution 2: open addressing

Open Addressing
Basic idea
 To insert: if slot is full, try another slot, …, until an open
slot is found (probing)
 To search, follow same sequence of probes as would be
used when inserting the element
• If reach element with correct key, return it
• If reach a NULL pointer, element is not in table
Good for fixed sets (adding but no deletion)
 Example: spell checking
Table needn’t be much bigger than n

Chaining
Chaining puts elements that hash to the same slot in a
linked list:
——
——
——
——
——
——
T
k4
k2
k3
k1
k5
U
(universe of keys)
K
(actual
keys)
k6
k8
k7
k1 k4 ——
k5 k2
k3
k8 k6 ——
——
k7 ——

Chaining
How do we insert an element?
——
——
——
——
——
——
T
k4
k2
k3
k1
k5
U
(universe of keys)
K
(actual
keys)
k6
k8
k7
k1 k4 ——
k5 k2
k3
k8 k6 ——
——
k7 ——

Chaining
——
——
——
——
——
——
T
k4
k2
k3
k1
k5
U
(universe of keys)
K
(actual
keys)
k6
k8
k7
k1 k4 ——
k5 k2
k3
k8 k6 ——
——
k7 ——
How do we delete an element?
 Do we need a doubly-linked list for efficient delete?

Chaining
How do we search for a element with a
given key?
——
——
——
——
——
——
T
k4
k2
k3
k1
k5
U
(universe of keys)
K
(actual
keys)
k6
k8
k7
k1 k4 ——
k5 k2
k3
k8 k6 ——
——
k7 ——

Analysis of Chaining
Assume simple uniform hashing: each key in table is
equally likely to be hashed to any slot
Given n keys and m slots in the table: the
load factor  = n/m = average # keys per slot
What will be the average cost of an unsuccessful search
for a key?

Given n keys and m slots in the table, the
for a key? A: O(1+)

What will be the average cost of a successful search?

What will be the average cost of a successful search?
A: O(1 + /2) = O(1 + )

Analysis of Chaining Continued
So the cost of searching = O(1 + )
If the number of keys n is proportional to the number of
slots in the table, what is ?
 A:  = O(1)
 In other words, we can make the expected cost of
searching constant if we make  constant

Choosing A Hash Function
Clearly choosing the hash function well is crucial
 What will a worst-case hash function do?
 What will be the time to search in this case?
What are desirable features of the hash function?
 Should distribute keys uniformly into slots
 Should not depend on patterns in the data

Hash Functions:
The Division Method
h(k) = k mod m
 In words: hash k into a table with m slots using the slot
given by the remainder of k divided by m
What happens to elements with adjacent
values of k?
What happens if m is a power of 2 (say 2P)?
What if m is a power of 10?
Upshot: pick table size m = prime number not too
close to a power of 2 (or 10)

Hash Functions:
The Multiplication Method
For a constant A, 0 < A < 1:
h(k) =  m (kA - kA) 
What does this term represent?

Hash Functions:
The Multiplication Method
For a constant A, 0 < A < 1:
h(k) =  m (kA - kA) 
Choose m = 2P
Choose A not too close to 0 or 1
Knuth: Good choice for A = (5 - 1)/2
Fractional part of kA

Hash Functions:
Worst Case Scenario
Scenario:
 You are given an assignment to implement hashing
 You will self-grade in pairs, testing and grading your
partner’s implementation
 In a blatant violation of the honor code, your partner:
• Analyzes your hash function
• Picks a sequence of “worst-case” keys, causing your
implementation to take O(n) time to search
What’s an honest CS student to do?

Hash Functions:
Universal Hashing
As before, when attempting to foil an malicious
adversary: randomize the algorithm
Universal hashing: pick a hash function randomly in a
way that is independent of the keys that are actually
going to be stored
 Guarantees good performance on average, no matter what
keys adversary chooses

 Many suggestions have been made for reducing the
coalescing of probe chains and thereby lowering the number
of retrieval probes which in turn improves performance.
The variants may be classified in three ways:
Variants
• The table organization (whether or not a separate
overflow area is used).
• The manner of linking a colliding item into a chain.
• The manner of choosing unoccupied locations.

 Coalescing may be reduced by modifying the table organization.
 Instead of allocating the entire table space for both overflow records and
home address records, the table is divided into a primary area and a
overflow area.
Primary
Overflow
(cellar)
Variants
• The primary area is the address space
that the hash function maps into.
• The overflow or cellar area contains
only overflow records.
• The address factor is the ratio of
primary area to the total table size –
Address Factor = primary area / total
table size

 For a fixed amount of storage, as the address factor
decreases, the cellar size increases, which reduces the
coalescing but because the primary area becomes smaller, it
increases the number of collisions.
 More collisions mean more items requiring multiple retrieval
probes.
 Vitter [2] determined that an address factor of 0.86 yields
nearly optimal retrieval performance for most load factors.
Variants

LISCH
 The algorithm given in slide 6 is called Late Insertion
Standard Coalesced Hashing (LISCH) since new records are
inserted at the end of a probe chain.
[
The ‘Standard’ in the name refers to the lack of a cellar.
 The variant of that algorithm that uses a cellar is called
LICH, Late Insertion Coalesced Hashing.
Variants

 Another way of varying the insertion algorithm
Changing the way in which we choose a unoccupied location.
The unoccupied locations are always chosen from the bottom of the
storage area. But the no. of collisions is increased in this way.
 Hsaio [3] suggest REISCH (‘R’ stands for ‘Random’), in which a random
unoccupied location for the new insertion is chosen.
REISCH gives only 1% improvement over EISCH.
 BLISCH (‘B’ signifies ‘Bidirectional’) is another method of choosing the
overflow location for a collision insertion is to alternate the selection between the
top and bottom of the table.
 In DCWC (Direct Chaining Without Coalescing), a record not stored at its home
address is moved.
Variants

Variants
Table 1: Mean number of probes for successful lookup (n = 997) for
variants of
Coalesced Hashing

Search techniques and Hashing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Search techniques and Hashing

Similar to Search techniques and Hashing (20)

Recently uploaded

Recently uploaded (20)

Search techniques and Hashing