SlideShare a Scribd company logo
1 of 101
Searching
Searching
1. Sequential
2. Binary
3. Interpolation
4. Hash table
Sequential Search
 Looks for the target from the first to the last
element of the list
 The later in the list the target occurs the longer
it takes to find it
 Does not assume anything about the order of the
elements in the list, so it can be used with an
unsorted list
Sequential Search Example
Sequential Search Algorithm
for i = 1 to N do
if (target == list[i])
return i
end if
end for
return 0
Worst-Case Analysis
 If the target is in the last location, we look at all
of the elements to find it
 If the target is not in the list, we need to look at
all of the elements to learn that
 Therefore, the largest number of comparisons
we will do in this algorithm is N
Average-Case Analysis
 If the search is always successful, there are N
places the target could be found
 It will take 1 comparison to find the target in the
first location, 2 comparisons to find the target in
the second location, and so on
 If each location is equally likely, we get:
2
11
)(A
1



N
i
N
N
N
i
Average-Case Analysis
 If the search can fail, there are N places the target
could be found and 1 possibility when it’s not
found
 If the target is not found, we do N comparisons
 If each of these N+1 possibilities are equally
likely, we get:
2
2
1
1
)(A
1














N
iN
N
N
N
i
Binary Search
 Used with a sorted list
 First check the middle list element
 If the target matches the middle element, we are
done
 If the target is less than the middle element, the
key must be in the first half
 If the target is larger than the middle element,
the key must be in the second half
Binary Search Example
Algorithm Review
 Each comparison eliminates about half of the
elements of the list from consideration
 If we begin with N = 2k
– 1 elements in the list,
there will be 2k–1
– 1 elements on the second
pass, and 2k–2
– 1 elements on the third pass
Worst-Case Analysis
 In the worst case, we will either find the target
on the last pass, or not find the target at all
 The last pass will have only one element left to
compare, which happens when
21
-1 = 1
 If N = 2k
– 1, then there must be
k = lg(N+1) passes
Average-Case Analysis
 If the search is always successful, there are N
places the target could be found
 There is one place we check on the first pass,
two places we could check on the second pass,
and four places we could check on the third pass
Average-Case Analysis
 We can represent binary search as a binary tree:
Average-Case Analysis
 In looking at the binary tree, we see that there
are i comparisons needed to find the 2i–1
elements on level i of the tree
 For a list with N = 2k
-1 elements, there are k
levels in the binary tree
 These two facts give us:
1)1lg(2*
1
)(A
1
1 

 Ni
N
N
k
i
i
Average-Case Analysis
 If the search can fail sometimes, there are N
places the target could be found and N+1
possibilities when it is not found
 In other words, if the missing key were added to
the list, it could be put at the beginning, between
any two elements, or at the end – a total of N+1
different places
Average-Case Analysis
 The possibilities when the key is found are still
the same as before, and the new cases all take k
comparisons when N = 2k
– 1
 This gives us:
 
2
1
)1lg(
2**1
12
1
)(A
1
1












N
ikN
N
N
k
i
i
Any Alternative to Binary Search?
 Have we used all the knowledge we have about
finding an item in an ordered array? The answer is
maybe not.
 If you were looking for Mr. Alfred Aaron in the
telephone book, would you open the book in the
middle and see whether Aaron was in the first half
or second half of the book? I think not.
Any Alternative to Binary Search?
 Given the additional information of the upper and
lower limits of the values in a list we can
improve on a binary search by estimating the
most likely position of an element in the list.
 This is called an interpolation search.
Interpolation Search
It proceeds like a binary search only the list
is divided each time according to our
estimate of where the key is situated.
Given a uniform distribution of keys,
interpolation search has an average case time
complexity of only lg(lg n).
Interpolation Search
 There is another type of information we
normally use when searching a phone book
which is not used by binary search but it is used
by interpolation search:
where would you open the phone book if
you where looking for Mr. Alfred Aaron?
Interpolation Search
 If the following conditions are true then interpolation
search may be better than binary search:
 Each access is very expensive compared to a typical instruction,
e.g. the array is stored on a disk and each comparison requires a
disk access.
 The data are not only sorted but also fairly uniformly
distributed, e.g. a phone book is fairly uniformly distributed, an
input like: [1,2,3,4,5,6,7,8,16,32,355,...] is not.
Interpolation Search
In this situation we are willing to
spend more time to make an accurate
guess where the item may be (instead
of always picking the mid point):
Interpolation Search
 For example:
 Array of 1000 items
 The lowest item in the range is 1000
 The highest item in range is 1,000,000
 We are looking for the item of value 12,000
 Then we expect to find the item around the 12th
position (always in the assumption that the items are
uniformly distributed). This is expressed by the
formula:
 
2
lastfirst
mid


   1
35
7*5
540
075100



mid
  3
2
70


mid
n = 8
k = 10
A: 10 15 20 25 30 35 405
0 7
   
   firstAlastA
firstlastfirstAkfirst
mid



first lastmid
?
Binary Search :
Interpolation Search:
Data Structures
Interpolation Search
 Calculation is more costly than the binary search
calculation
 It needs to be done using floating point operations.
 One iteration may be slower than the complete binary
search.
 If the cost of this calculation is insignificant to the cost of
accessing an item, we only care about the number of
iterations.
Interpolation Search
 In the worst case, when the numbers are not uniformly
distributed, the running time could be linear and all the
items might be examined.
 If the items are reasonably uniformly distributed, the
running time has been demonstrated to be O(log log N)
(apply the logarithm twice in succession).
 For example, for N = 4billion, log N is about 32 and
loglog N is roughly 5.
Hashing
 Hashing
 Hash functions
 Hash Tables
 STL’s hash_map
 Hash tables are a common approach to the
storing/searching problem.
Hash Tables
What is a Hash Table ?
 The simplest kind of hash table
is an array of records.
 This example has 701 records.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
An array of records
. . .
[ 700]
What is a Hash Table ?
 Each record has a special
field, called its key.
 In this example, the key is a
long integer field called
Number.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
. . .
[ 700]
[ 4 ]
Number 506643548
What is a Hash Table ?
 The number might be a
person's identification
number, and the rest of the
record has information about
the person.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
. . .
[ 700]
[ 4 ]
Number 506643548
What is a Hash Table ?
 When a hash table is in use,
some spots contain valid
records, and other spots are
"empty".
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Inserting a New Record
 In order to insert a new record,
the key must somehow be
converted to an array index.
 The index is called the hash
value of the key.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685
Inserting a New Record
 Typical way create a hash
value:
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685
(Number mod 701)
What is (580625685 mod 701) ?
Inserting a New Record
 Typical way to create a hash
value:
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685
(Number mod 701)
What is (580625685 mod 701) ?
3
Inserting a New Record
 The hash value is used for the
location of the new record.
Number 580625685
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
[3]
Inserting a New Record
 The hash value is used for the
location of the new record.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685
Collisions
 Here is another new record to
insert, with a hash value of 2.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685
Number 701466868
My hash
value is [2].
Collisions
 This is called a collision,
because there is already
another valid record at [2].
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685
Number 701466868
When a collision occurs,
move forward until you
find an empty spot.
Collisions
 This is called a collision,
because there is already
another valid record at [2].
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685
Number 701466868
When a collision occurs,
move forward until you
find an empty spot.
Collisions
 This is called a collision,
because there is already
another valid record at [2].
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685
Number 701466868
When a collision occurs,
move forward until you
find an empty spot.
Collisions
 This is called a collision,
because there is already
another valid record at [2].
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
The new record goes
in the empty spot.
A Quiz
Where would you be placed in
this table, if there is no
collision? Use your social
security number or some other
favorite number.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322Number 580625685 Number 701466868
. . .
Searching for a Key
 The data that's attached to a
key can be found fairly
quickly.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
Number 701466868
Searching for a Key
 Calculate the hash value.
 Check that location of the array for
the key.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
Number 701466868
My hash
value is [2].
Not me.
Searching for a Key
 Keep moving forward until you
find the key, or you reach an
empty spot.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
Number 701466868
My hash
value is [2].
Not me.
Searching for a Key
 Keep moving forward until you
find the key, or you reach an
empty spot.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
Number 701466868
My hash
value is [2].
Not me.
Searching for a Key
 Keep moving forward until you
find the key, or you reach an
empty spot.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
Number 701466868
My hash
value is [2].
Yes!
Searching for a Key
 When the item is found, the
information can be copied to the
necessary location.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
Number 701466868
My hash
value is [2].
Yes!
Deleting a Record
 Records may also be deleted from a hash table.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
Please
delete me.
Deleting a Record
 Records may also be deleted from a hash table.
 But the location must not be left as an ordinary "empty
spot" since that could interfere with searches.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
Deleting a Record
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
 Records may also be deleted from a hash table.
 But the location must not be left as an ordinary "empty
spot" since that could interfere with searches.
 The location must be marked in some special way so that a
search can tell that the spot used to have something in it.
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Hash Table
In the previous studies, all the searches had an
efficiency of at least O(logn)
Can it be faster?
 For example, if a primary key contains values from 0 to
99, then a table (array) of size 100 would be enough for
each record to be directly located by the key value
which is the subscript of the table
 If we can match all key values to different slots of a
table, we can make searching for a record very efficient
  Hash Table: ideally to support search time O(1)
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Hash function and hash key
 Key values may not be numeric or may be very large, but
we may transform the key into a value within a range
 E.g., suppose that there are at most m (10000) records in
the file. Even if the key is in 8 digits, we may use a
function, e.g., key / 10000 to transform keys with 8 digits
to a value from 0-9999
 Such a function which transforms a key into a value
which may further transform to a subscript of an array,
in a fixed length, is called hash function
 The key being transformed is called the hash of key
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Perfect hash function
An ideal (perfect) hash
function transforms all
different hash of keys into
different subscripts of a table
When a file has a million
records, it is difficult to have
such a function
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Hash Value and Hash Table index
A hash function transforms a key to a value which is
called hash value
This value may need to further be transformed to a
subscript of an array: hashValue%m where m is the
table size
The value which can map to a subscript of an array is
called hash table index
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Hash collision (clash)
When two hash of keys have the same hashed values,
it is called a hash collision or a hash clash
E.g., given a hash function h(key) = key and the hash
table size 1000, ==> hash table size: hi(h(1322)) = 1322
% 1000 = hi(h(2322)) = 2322 % 1000 = 322
That means both key 1322 and 2322 may attempt to
insert the record into the same position
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Resolving hash clashes
There are two basic techniques:
1. Chaining (Open hashing): Keys with the same hash
values will be linked together and a search process
should sequentially traverse all the items in the
linked list
2. Open Addressing (Closed Hashing) : Whenever
there is a clash, it will rehash – to find another slot
in the table
 many techniques: e.g., linear probing, quadratic probing
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Chaining
Example: h(key) = key % 10
Input: 2822, 1615, 2813, 3553, 4288, 2125, 8232
0
1
2
3
4
5
6
7
8
9
2822
2813
1615
8232
3553
2125
4288
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Open Addressing: Linear probing
Place the record in the next available position in the
array, i.e., rh(i) = i+1. E.g., (input: 2822, 1615, 2813, 3553,
4288, 2125, 8232)
0
1
2
3
4
5
6
7
8
9
2822
1615
2813
3553
4288
2125
8232
3553: h(3553)=3, rh(1)=4
2125: h(2125)=5, rh(1)=6
8232: h(8232)=2,
rh(1)=3,r(2)=4,
rh(3)=5, rh(4)=6, rh(5)=7
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Open addressing -- quadratic rehash
the jth rehash is hj(key) = (h(key)+j2) % array_size
E.g., (input: 2822, 1615, 2813, 3553, 4288, 2125, 8232)
0
1
2
3
4
5
6
7
8
9
2822
1615
2813
3553
4288
2125
8232
3553: h(3553)=3, h1=3+1=4
2125: h(2125)=5, h1=5+1=6
8232: h(8232)=2, h1=2+1=3,
h2=2+4=6,
h3=(2+9)%10=1
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Hash table re-sizing
When a hash table is full or nearly full, it requires
re-sizing to increase the size of the hash table
One of the methods is to take its first prime which is
twice as large as the old table size
For the previous table size 10  new table size is 23
and new hash function is h(key)=key%23
0 91 2 3 4 5 6 7 8 1
0
1
1
1
2
1
3
1
4
1
5
1
6
1
7
1
8
1
9
2
0
2
1
2
2
2
8
2
2
1
6
1
5
2
8
1
3
3
5
5
3
4
2
8
8
2
1
2
5
8
2
3
2
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Load Factor
To determine if a hash table is full or
nearly full, load factor is used
The value of the load factor is the ratio
of number of elements (m) to the slots
(n) of the table: m/n
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Acceptable ranges of load factor
For different addressing methods, the load factor
has different acceptable ranges
 Closed addressing (chaining): about 2 to 4 – if key
values are well distributed in the table, it is expected
that every linked list has one or more nodes than the
load factor, i.e., every hit may require at most 4 to 6
visits
 Open addressing: less than about 0.7 – it is the
percentage of slots being occupied – a larger percentage
may make a key to be rehashed many times – no more
O(1)
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Exercises
Ford’s 12:15.a-b++
hf(x) = x, m=11, data: 1, 13, 12, 53, 77, 29, 31,22
 a) Construct the hash table by using linear probe addressing
 Construct the table again by using rehash function:
index = (index + 5) % 11
 b) Construct the hash table by using chaining with separate
lists; and also
 Determine the load factors of the tables.
 Depict the hash table after resize, the one resulting from
linear probing.
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Hash Functions for integer data
A hash function usually produces a non-negative
value
A common hash function of numeric data is simply
hash(x) = abs(x)
Ford’s: hash(x) = x2 / 256 % 65536
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Hash Functions for real numbers
Ford’s:
 hash(x) = 0 if x = 0; otherwise
 hashval = abs(2 * fabs(frexp(x,&exp)) -1);
where frexp() is a C library function which is used to
decompose num into two parts: a mantissa between 0.5 and
1 (returned by the function) and an exponent returned as
exp; and scientific notation works like this:
x = mantissa * (2 ^ exp)
(Reference: www.cppreference.com)
ICarnegie: hash(x) = floor(m * (frac(x * r)), where
typically, r can be the Golden Ratio (sqrt(5) – 1)/2 and
m is the table size
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Hash functions for strings
It is quite easy to think about converting each
character to its ASCII value (65-90 and 97-122) and
then accumulate its sum as the hash values – all
permutations of a word hash to the same slot!
The value of a character at different positions
multiplies a factor then sums up the result – making a
string similar to a number
 when the factor is too small, it may not be significant
 when the factor is too large, the resulting value would
overflow – only the last few characters become
accountable!
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Hash Table vs BST
 Timing for searching
 Ideally, hash table has the complexity of O(1) while BST has a
complexity of O(log n)
 However, it may require more than O(log n) if many keys are
clashed to the same slot. Even with the load factor, a hash table may
maintain an optimal time in searching but it takes very much time
when the hash table is required to re-size in order to maintain an
acceptable load factor
 Sequential scan and range scan
 The in-order traversal on a BST is a sequential scan, and range scan
is just a partial scan of the in-order traversal
 Hash table does not easily support sequential scan on key values
unless the hash function maintains the order of the key values – such
a hash function may not distribute very well different key values into
different slots
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
 Coalesced hashing is a collision resolution method that
uses pointers to connect the elements of a synonym
chain.
Coalesced Hashing
• A hybrid of separate chaining and open addressing.
• Linked lists within the hash table handle collisions.
• This strategy is effective, efficient and very easy to
implement.
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
 Coalesced hashing obtains its name from what occurs when we attempt
to insert a record with a home address that is already occupied by a record
from a chain with a different home address.
Coalesced Hashing
This situation would occur, for example, if we attempted to insert
a record with a home address of s into the hash table.
What occurs is that the two chains with records having different
home addresses coalesce or grow together.
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
 In figure to the right, the records with
keys X, D, and Y were inserted in the given
order into the hash table. A, B, C, and D
form one set of synonyms and X and Y form
another set.
 When X is inserted into the table with
coalescing, it must be inserted as the end of
the chain that it is coalescing with.
Instead of needing only one probe to retrieve
X, three are needed. The greater the
coalescing the longer he probe chain will be,
and as a result, retrieval performance will be
degraded.
 When record D is now added, it must be
inserted at the end of the coalesced chains;
we must move over record X from the other
chain then to locate D.
Coalesced Hashing
Synonym chain: with coalescing
(The shaded portion indicates portion
of the chain in which coalescing has
occurred, the thin line represents the
insertions on the synonym chain with r
as its home address. The thick line
represents the insertions on the chain
with s as its home address.)
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Coalesced Hashing
Coalesced hashing originated with Williams [1] and is also
referred to as direct chaining.
Algorithm for Coalesced Hashing
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Hash Tables
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Hash Tables
Hash table:
 Given a table T and a record x, with key (= symbol) and
satellite data, we need to support:
• Insert (T, x)
• Delete (T, x)
• Search(T, x)
 We want these to be fast, but don’t care about sorting the
records
 In this discussion we consider all keys to be (possibly
large) natural numbers
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Direct Addressing
Suppose:
 The range of keys is 0..m-1
 Keys are distinct
The idea:
 Set up an array T[0..m-1] in which
• T[i] = x if x T and key[x] = i
• T[i] = NULL otherwise
 This is called a direct-address table
• Operations take O(1) time!
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
The Problem With
Direct Addressing
Direct addressing works well when the range m of
keys is relatively small
But what if the keys are 32-bit integers?
 Problem 1: direct-address table will have
232 entries, more than 4 billion
 Problem 2: even if memory is not an issue, the time to
initialize the elements to NULL may be
Solution: map keys to smaller range 0..m-1
This mapping is called a hash function
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Hash Functions
Next problem: collision
T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U
(universe of keys)
K
(actual
keys)
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Resolving Collisions
How can we solve the problem of collisions?
Solution 1: chaining
Solution 2: open addressing
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Open Addressing
Basic idea
 To insert: if slot is full, try another slot, …, until an open
slot is found (probing)
 To search, follow same sequence of probes as would be
used when inserting the element
• If reach element with correct key, return it
• If reach a NULL pointer, element is not in table
Good for fixed sets (adding but no deletion)
 Example: spell checking
Table needn’t be much bigger than n
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Chaining
Chaining puts elements that hash to the same slot in a
linked list:
——
——
——
——
——
——
T
k4
k2
k3
k1
k5
U
(universe of keys)
K
(actual
keys)
k6
k8
k7
k1 k4 ——
k5 k2
k3
k8 k6 ——
——
k7 ——
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Chaining
How do we insert an element?
——
——
——
——
——
——
T
k4
k2
k3
k1
k5
U
(universe of keys)
K
(actual
keys)
k6
k8
k7
k1 k4 ——
k5 k2
k3
k8 k6 ——
——
k7 ——
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Chaining
——
——
——
——
——
——
T
k4
k2
k3
k1
k5
U
(universe of keys)
K
(actual
keys)
k6
k8
k7
k1 k4 ——
k5 k2
k3
k8 k6 ——
——
k7 ——
How do we delete an element?
 Do we need a doubly-linked list for efficient delete?
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Chaining
How do we search for a element with a
given key?
——
——
——
——
——
——
T
k4
k2
k3
k1
k5
U
(universe of keys)
K
(actual
keys)
k6
k8
k7
k1 k4 ——
k5 k2
k3
k8 k6 ——
——
k7 ——
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Analysis of Chaining
Assume simple uniform hashing: each key in table is
equally likely to be hashed to any slot
Given n keys and m slots in the table: the
load factor  = n/m = average # keys per slot
What will be the average cost of an unsuccessful search
for a key?
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Analysis of Chaining
Assume simple uniform hashing: each key in table is
equally likely to be hashed to any slot
Given n keys and m slots in the table, the
load factor  = n/m = average # keys per slot
What will be the average cost of an unsuccessful search
for a key? A: O(1+)
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Analysis of Chaining
Assume simple uniform hashing: each key in table is
equally likely to be hashed to any slot
Given n keys and m slots in the table, the
load factor  = n/m = average # keys per slot
What will be the average cost of an unsuccessful search
for a key? A: O(1+)
What will be the average cost of a successful search?
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Analysis of Chaining
Assume simple uniform hashing: each key in table is
equally likely to be hashed to any slot
Given n keys and m slots in the table, the
load factor  = n/m = average # keys per slot
What will be the average cost of an unsuccessful search
for a key? A: O(1+)
What will be the average cost of a successful search?
A: O(1 + /2) = O(1 + )
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Analysis of Chaining Continued
So the cost of searching = O(1 + )
If the number of keys n is proportional to the number of
slots in the table, what is ?
 A:  = O(1)
 In other words, we can make the expected cost of
searching constant if we make  constant
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Choosing A Hash Function
Clearly choosing the hash function well is crucial
 What will a worst-case hash function do?
 What will be the time to search in this case?
What are desirable features of the hash function?
 Should distribute keys uniformly into slots
 Should not depend on patterns in the data
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Hash Functions:
The Division Method
h(k) = k mod m
 In words: hash k into a table with m slots using the slot
given by the remainder of k divided by m
What happens to elements with adjacent
values of k?
What happens if m is a power of 2 (say 2P)?
What if m is a power of 10?
Upshot: pick table size m = prime number not too
close to a power of 2 (or 10)
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Hash Functions:
The Multiplication Method
For a constant A, 0 < A < 1:
h(k) =  m (kA - kA) 
What does this term represent?
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Hash Functions:
The Multiplication Method
For a constant A, 0 < A < 1:
h(k) =  m (kA - kA) 
Choose m = 2P
Choose A not too close to 0 or 1
Knuth: Good choice for A = (5 - 1)/2
Fractional part of kA
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Hash Functions:
Worst Case Scenario
Scenario:
 You are given an assignment to implement hashing
 You will self-grade in pairs, testing and grading your
partner’s implementation
 In a blatant violation of the honor code, your partner:
• Analyzes your hash function
• Picks a sequence of “worst-case” keys, causing your
implementation to take O(n) time to search
What’s an honest CS student to do?
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Hash Functions:
Universal Hashing
As before, when attempting to foil an malicious
adversary: randomize the algorithm
Universal hashing: pick a hash function randomly in a
way that is independent of the keys that are actually
going to be stored
 Guarantees good performance on average, no matter what
keys adversary chooses
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
 Many suggestions have been made for reducing the
coalescing of probe chains and thereby lowering the number
of retrieval probes which in turn improves performance.
The variants may be classified in three ways:
Variants
• The table organization (whether or not a separate
overflow area is used).
• The manner of linking a colliding item into a chain.
• The manner of choosing unoccupied locations.
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
 Coalescing may be reduced by modifying the table organization.
 Instead of allocating the entire table space for both overflow records and
home address records, the table is divided into a primary area and a
overflow area.
Primary
Overflow
(cellar)
Variants
• The primary area is the address space
that the hash function maps into.
• The overflow or cellar area contains
only overflow records.
• The address factor is the ratio of
primary area to the total table size –
Address Factor = primary area / total
table size
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
 For a fixed amount of storage, as the address factor
decreases, the cellar size increases, which reduces the
coalescing but because the primary area becomes smaller, it
increases the number of collisions.
 More collisions mean more items requiring multiple retrieval
probes.
 Vitter [2] determined that an address factor of 0.86 yields
nearly optimal retrieval performance for most load factors.
Variants
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
LISCH
 The algorithm given in slide 6 is called Late Insertion
Standard Coalesced Hashing (LISCH) since new records are
inserted at the end of a probe chain.
[
The ‘Standard’ in the name refers to the lack of a cellar.
 The variant of that algorithm that uses a cellar is called
LICH, Late Insertion Coalesced Hashing.
Variants
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
 Another way of varying the insertion algorithm
Changing the way in which we choose a unoccupied location.
The unoccupied locations are always chosen from the bottom of the
storage area. But the no. of collisions is increased in this way.
 Hsaio [3] suggest REISCH (‘R’ stands for ‘Random’), in which a random
unoccupied location for the new insertion is chosen.
REISCH gives only 1% improvement over EISCH.
 BLISCH (‘B’ signifies ‘Bidirectional’) is another method of choosing the
overflow location for a collision insertion is to alternate the selection between the
top and bottom of the table.
 In DCWC (Direct Chaining Without Coalescing), a record not stored at its home
address is moved.
Variants
Rossella Lau Lecture 10, DCO20105, Semester A,2005-6
Variants
Table 1: Mean number of probes for successful lookup (n = 997) for
variants of
Coalesced Hashing

More Related Content

What's hot (20)

Hashing
HashingHashing
Hashing
 
Binary Search
Binary SearchBinary Search
Binary Search
 
Hash tables
Hash tablesHash tables
Hash tables
 
Hashing data
Hashing dataHashing data
Hashing data
 
Searching and Sorting Techniques in Data Structure
Searching and Sorting Techniques in Data StructureSearching and Sorting Techniques in Data Structure
Searching and Sorting Techniques in Data Structure
 
358 33 powerpoint-slides_15-hashing-collision_chapter-15
358 33 powerpoint-slides_15-hashing-collision_chapter-15358 33 powerpoint-slides_15-hashing-collision_chapter-15
358 33 powerpoint-slides_15-hashing-collision_chapter-15
 
Binary Search Tree
Binary Search TreeBinary Search Tree
Binary Search Tree
 
Sequential & binary, linear search
Sequential & binary, linear searchSequential & binary, linear search
Sequential & binary, linear search
 
Searching algorithms
Searching algorithmsSearching algorithms
Searching algorithms
 
Binary search python
Binary search pythonBinary search python
Binary search python
 
Hash tables
Hash tablesHash tables
Hash tables
 
Demonstrate interpolation search
Demonstrate interpolation searchDemonstrate interpolation search
Demonstrate interpolation search
 
Hashing
HashingHashing
Hashing
 
linear search and binary search
linear search and binary searchlinear search and binary search
linear search and binary search
 
Rahat &amp; juhith
Rahat &amp; juhithRahat &amp; juhith
Rahat &amp; juhith
 
Hashing
HashingHashing
Hashing
 
Fundamentals of data structures
Fundamentals of data structuresFundamentals of data structures
Fundamentals of data structures
 
Unit 6 dsa SEARCHING AND SORTING
Unit 6 dsa SEARCHING AND SORTINGUnit 6 dsa SEARCHING AND SORTING
Unit 6 dsa SEARCHING AND SORTING
 
Sienna 9 hashing
Sienna 9 hashingSienna 9 hashing
Sienna 9 hashing
 
Data Structures & Algorithm design using C
Data Structures & Algorithm design using C Data Structures & Algorithm design using C
Data Structures & Algorithm design using C
 

Similar to Search techniques and Hashing

Searching.ppt
Searching.pptSearching.ppt
Searching.pptp83629918
 
Searching Algorithms with Binary Search and Hashing Concept with Time and Spa...
Searching Algorithms with Binary Search and Hashing Concept with Time and Spa...Searching Algorithms with Binary Search and Hashing Concept with Time and Spa...
Searching Algorithms with Binary Search and Hashing Concept with Time and Spa...mrhabib10
 
advanced searching and sorting.pdf
advanced searching and sorting.pdfadvanced searching and sorting.pdf
advanced searching and sorting.pdfharamaya university
 
Chapter 11 - Sorting and Searching
Chapter 11 - Sorting and SearchingChapter 11 - Sorting and Searching
Chapter 11 - Sorting and SearchingEduardo Bergavera
 
Searching and Sorting Algorithms
Searching and Sorting AlgorithmsSearching and Sorting Algorithms
Searching and Sorting AlgorithmsAshutosh Satapathy
 
Linear Search
Linear SearchLinear Search
Linear SearchSWATHIR72
 
Searching techniques
Searching techniquesSearching techniques
Searching techniquesER Punit Jain
 
Dsa – data structure and algorithms searching
Dsa – data structure and algorithms   searchingDsa – data structure and algorithms   searching
Dsa – data structure and algorithms searchingsajinis3
 
sorting and searching.pptx
sorting and searching.pptxsorting and searching.pptx
sorting and searching.pptxParagAhir1
 
searching techniques.pptx
searching techniques.pptxsearching techniques.pptx
searching techniques.pptxDr.Shweta
 

Similar to Search techniques and Hashing (20)

Searching.ppt
Searching.pptSearching.ppt
Searching.ppt
 
Searching.ppt
Searching.pptSearching.ppt
Searching.ppt
 
Searching Algorithms with Binary Search and Hashing Concept with Time and Spa...
Searching Algorithms with Binary Search and Hashing Concept with Time and Spa...Searching Algorithms with Binary Search and Hashing Concept with Time and Spa...
Searching Algorithms with Binary Search and Hashing Concept with Time and Spa...
 
dsa pdf.pdf
dsa pdf.pdfdsa pdf.pdf
dsa pdf.pdf
 
Lecture_Oct26.pptx
Lecture_Oct26.pptxLecture_Oct26.pptx
Lecture_Oct26.pptx
 
advanced searching and sorting.pdf
advanced searching and sorting.pdfadvanced searching and sorting.pdf
advanced searching and sorting.pdf
 
search_sort.ppt
search_sort.pptsearch_sort.ppt
search_sort.ppt
 
Binary search
Binary searchBinary search
Binary search
 
1 D Arrays in C++
1 D Arrays in C++1 D Arrays in C++
1 D Arrays in C++
 
Chapter 11 - Sorting and Searching
Chapter 11 - Sorting and SearchingChapter 11 - Sorting and Searching
Chapter 11 - Sorting and Searching
 
Sorting
SortingSorting
Sorting
 
Searching and Sorting Algorithms
Searching and Sorting AlgorithmsSearching and Sorting Algorithms
Searching and Sorting Algorithms
 
Linear Search
Linear SearchLinear Search
Linear Search
 
Searching techniques
Searching techniquesSearching techniques
Searching techniques
 
Searching techniques
Searching techniquesSearching techniques
Searching techniques
 
Dsa – data structure and algorithms searching
Dsa – data structure and algorithms   searchingDsa – data structure and algorithms   searching
Dsa – data structure and algorithms searching
 
Unit 8 searching and hashing
Unit   8 searching and hashingUnit   8 searching and hashing
Unit 8 searching and hashing
 
sorting and searching.pptx
sorting and searching.pptxsorting and searching.pptx
sorting and searching.pptx
 
searching techniques.pptx
searching techniques.pptxsearching techniques.pptx
searching techniques.pptx
 
SEARCHING
SEARCHINGSEARCHING
SEARCHING
 

Recently uploaded

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...ranjana rawat
 

Recently uploaded (20)

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
 

Search techniques and Hashing

  • 2. Sequential Search  Looks for the target from the first to the last element of the list  The later in the list the target occurs the longer it takes to find it  Does not assume anything about the order of the elements in the list, so it can be used with an unsorted list
  • 4. Sequential Search Algorithm for i = 1 to N do if (target == list[i]) return i end if end for return 0
  • 5. Worst-Case Analysis  If the target is in the last location, we look at all of the elements to find it  If the target is not in the list, we need to look at all of the elements to learn that  Therefore, the largest number of comparisons we will do in this algorithm is N
  • 6. Average-Case Analysis  If the search is always successful, there are N places the target could be found  It will take 1 comparison to find the target in the first location, 2 comparisons to find the target in the second location, and so on  If each location is equally likely, we get: 2 11 )(A 1    N i N N N i
  • 7. Average-Case Analysis  If the search can fail, there are N places the target could be found and 1 possibility when it’s not found  If the target is not found, we do N comparisons  If each of these N+1 possibilities are equally likely, we get: 2 2 1 1 )(A 1               N iN N N N i
  • 8. Binary Search  Used with a sorted list  First check the middle list element  If the target matches the middle element, we are done  If the target is less than the middle element, the key must be in the first half  If the target is larger than the middle element, the key must be in the second half
  • 10. Algorithm Review  Each comparison eliminates about half of the elements of the list from consideration  If we begin with N = 2k – 1 elements in the list, there will be 2k–1 – 1 elements on the second pass, and 2k–2 – 1 elements on the third pass
  • 11. Worst-Case Analysis  In the worst case, we will either find the target on the last pass, or not find the target at all  The last pass will have only one element left to compare, which happens when 21 -1 = 1  If N = 2k – 1, then there must be k = lg(N+1) passes
  • 12. Average-Case Analysis  If the search is always successful, there are N places the target could be found  There is one place we check on the first pass, two places we could check on the second pass, and four places we could check on the third pass
  • 13. Average-Case Analysis  We can represent binary search as a binary tree:
  • 14. Average-Case Analysis  In looking at the binary tree, we see that there are i comparisons needed to find the 2i–1 elements on level i of the tree  For a list with N = 2k -1 elements, there are k levels in the binary tree  These two facts give us: 1)1lg(2* 1 )(A 1 1    Ni N N k i i
  • 15. Average-Case Analysis  If the search can fail sometimes, there are N places the target could be found and N+1 possibilities when it is not found  In other words, if the missing key were added to the list, it could be put at the beginning, between any two elements, or at the end – a total of N+1 different places
  • 16. Average-Case Analysis  The possibilities when the key is found are still the same as before, and the new cases all take k comparisons when N = 2k – 1  This gives us:   2 1 )1lg( 2**1 12 1 )(A 1 1             N ikN N N k i i
  • 17. Any Alternative to Binary Search?  Have we used all the knowledge we have about finding an item in an ordered array? The answer is maybe not.  If you were looking for Mr. Alfred Aaron in the telephone book, would you open the book in the middle and see whether Aaron was in the first half or second half of the book? I think not.
  • 18. Any Alternative to Binary Search?  Given the additional information of the upper and lower limits of the values in a list we can improve on a binary search by estimating the most likely position of an element in the list.  This is called an interpolation search.
  • 19. Interpolation Search It proceeds like a binary search only the list is divided each time according to our estimate of where the key is situated. Given a uniform distribution of keys, interpolation search has an average case time complexity of only lg(lg n).
  • 20. Interpolation Search  There is another type of information we normally use when searching a phone book which is not used by binary search but it is used by interpolation search: where would you open the phone book if you where looking for Mr. Alfred Aaron?
  • 21. Interpolation Search  If the following conditions are true then interpolation search may be better than binary search:  Each access is very expensive compared to a typical instruction, e.g. the array is stored on a disk and each comparison requires a disk access.  The data are not only sorted but also fairly uniformly distributed, e.g. a phone book is fairly uniformly distributed, an input like: [1,2,3,4,5,6,7,8,16,32,355,...] is not.
  • 22. Interpolation Search In this situation we are willing to spend more time to make an accurate guess where the item may be (instead of always picking the mid point):
  • 23. Interpolation Search  For example:  Array of 1000 items  The lowest item in the range is 1000  The highest item in range is 1,000,000  We are looking for the item of value 12,000  Then we expect to find the item around the 12th position (always in the assumption that the items are uniformly distributed). This is expressed by the formula:
  • 24.   2 lastfirst mid      1 35 7*5 540 075100    mid   3 2 70   mid n = 8 k = 10 A: 10 15 20 25 30 35 405 0 7        firstAlastA firstlastfirstAkfirst mid    first lastmid ? Binary Search : Interpolation Search: Data Structures
  • 25. Interpolation Search  Calculation is more costly than the binary search calculation  It needs to be done using floating point operations.  One iteration may be slower than the complete binary search.  If the cost of this calculation is insignificant to the cost of accessing an item, we only care about the number of iterations.
  • 26. Interpolation Search  In the worst case, when the numbers are not uniformly distributed, the running time could be linear and all the items might be examined.  If the items are reasonably uniformly distributed, the running time has been demonstrated to be O(log log N) (apply the logarithm twice in succession).  For example, for N = 4billion, log N is about 32 and loglog N is roughly 5.
  • 27. Hashing  Hashing  Hash functions  Hash Tables  STL’s hash_map
  • 28.  Hash tables are a common approach to the storing/searching problem. Hash Tables
  • 29. What is a Hash Table ?  The simplest kind of hash table is an array of records.  This example has 701 records. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] An array of records . . . [ 700]
  • 30. What is a Hash Table ?  Each record has a special field, called its key.  In this example, the key is a long integer field called Number. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] . . . [ 700] [ 4 ] Number 506643548
  • 31. What is a Hash Table ?  The number might be a person's identification number, and the rest of the record has information about the person. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] . . . [ 700] [ 4 ] Number 506643548
  • 32. What is a Hash Table ?  When a hash table is in use, some spots contain valid records, and other spots are "empty". [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . .
  • 33. Inserting a New Record  In order to insert a new record, the key must somehow be converted to an array index.  The index is called the hash value of the key. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685
  • 34. Inserting a New Record  Typical way create a hash value: [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 (Number mod 701) What is (580625685 mod 701) ?
  • 35. Inserting a New Record  Typical way to create a hash value: [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 (Number mod 701) What is (580625685 mod 701) ? 3
  • 36. Inserting a New Record  The hash value is used for the location of the new record. Number 580625685 [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . [3]
  • 37. Inserting a New Record  The hash value is used for the location of the new record. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685
  • 38. Collisions  Here is another new record to insert, with a hash value of 2. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 Number 701466868 My hash value is [2].
  • 39. Collisions  This is called a collision, because there is already another valid record at [2]. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 Number 701466868 When a collision occurs, move forward until you find an empty spot.
  • 40. Collisions  This is called a collision, because there is already another valid record at [2]. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 Number 701466868 When a collision occurs, move forward until you find an empty spot.
  • 41. Collisions  This is called a collision, because there is already another valid record at [2]. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 Number 701466868 When a collision occurs, move forward until you find an empty spot.
  • 42. Collisions  This is called a collision, because there is already another valid record at [2]. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 Number 701466868 The new record goes in the empty spot.
  • 43. A Quiz Where would you be placed in this table, if there is no collision? Use your social security number or some other favorite number. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685 Number 701466868 . . .
  • 44. Searching for a Key  The data that's attached to a key can be found fairly quickly. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 Number 701466868 Number 701466868
  • 45. Searching for a Key  Calculate the hash value.  Check that location of the array for the key. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 Number 701466868 Number 701466868 My hash value is [2]. Not me.
  • 46. Searching for a Key  Keep moving forward until you find the key, or you reach an empty spot. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 Number 701466868 Number 701466868 My hash value is [2]. Not me.
  • 47. Searching for a Key  Keep moving forward until you find the key, or you reach an empty spot. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 Number 701466868 Number 701466868 My hash value is [2]. Not me.
  • 48. Searching for a Key  Keep moving forward until you find the key, or you reach an empty spot. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 Number 701466868 Number 701466868 My hash value is [2]. Yes!
  • 49. Searching for a Key  When the item is found, the information can be copied to the necessary location. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 Number 701466868 Number 701466868 My hash value is [2]. Yes!
  • 50. Deleting a Record  Records may also be deleted from a hash table. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 Number 701466868 Please delete me.
  • 51. Deleting a Record  Records may also be deleted from a hash table.  But the location must not be left as an ordinary "empty spot" since that could interfere with searches. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 Number 701466868
  • 52. Deleting a Record [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 Number 701466868  Records may also be deleted from a hash table.  But the location must not be left as an ordinary "empty spot" since that could interfere with searches.  The location must be marked in some special way so that a search can tell that the spot used to have something in it.
  • 53. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Hash Table In the previous studies, all the searches had an efficiency of at least O(logn) Can it be faster?  For example, if a primary key contains values from 0 to 99, then a table (array) of size 100 would be enough for each record to be directly located by the key value which is the subscript of the table  If we can match all key values to different slots of a table, we can make searching for a record very efficient   Hash Table: ideally to support search time O(1)
  • 54. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Hash function and hash key  Key values may not be numeric or may be very large, but we may transform the key into a value within a range  E.g., suppose that there are at most m (10000) records in the file. Even if the key is in 8 digits, we may use a function, e.g., key / 10000 to transform keys with 8 digits to a value from 0-9999  Such a function which transforms a key into a value which may further transform to a subscript of an array, in a fixed length, is called hash function  The key being transformed is called the hash of key
  • 55. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Perfect hash function An ideal (perfect) hash function transforms all different hash of keys into different subscripts of a table When a file has a million records, it is difficult to have such a function
  • 56. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Hash Value and Hash Table index A hash function transforms a key to a value which is called hash value This value may need to further be transformed to a subscript of an array: hashValue%m where m is the table size The value which can map to a subscript of an array is called hash table index
  • 57. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Hash collision (clash) When two hash of keys have the same hashed values, it is called a hash collision or a hash clash E.g., given a hash function h(key) = key and the hash table size 1000, ==> hash table size: hi(h(1322)) = 1322 % 1000 = hi(h(2322)) = 2322 % 1000 = 322 That means both key 1322 and 2322 may attempt to insert the record into the same position
  • 58. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Resolving hash clashes There are two basic techniques: 1. Chaining (Open hashing): Keys with the same hash values will be linked together and a search process should sequentially traverse all the items in the linked list 2. Open Addressing (Closed Hashing) : Whenever there is a clash, it will rehash – to find another slot in the table  many techniques: e.g., linear probing, quadratic probing
  • 59. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Chaining Example: h(key) = key % 10 Input: 2822, 1615, 2813, 3553, 4288, 2125, 8232 0 1 2 3 4 5 6 7 8 9 2822 2813 1615 8232 3553 2125 4288
  • 60. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Open Addressing: Linear probing Place the record in the next available position in the array, i.e., rh(i) = i+1. E.g., (input: 2822, 1615, 2813, 3553, 4288, 2125, 8232) 0 1 2 3 4 5 6 7 8 9 2822 1615 2813 3553 4288 2125 8232 3553: h(3553)=3, rh(1)=4 2125: h(2125)=5, rh(1)=6 8232: h(8232)=2, rh(1)=3,r(2)=4, rh(3)=5, rh(4)=6, rh(5)=7
  • 61. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Open addressing -- quadratic rehash the jth rehash is hj(key) = (h(key)+j2) % array_size E.g., (input: 2822, 1615, 2813, 3553, 4288, 2125, 8232) 0 1 2 3 4 5 6 7 8 9 2822 1615 2813 3553 4288 2125 8232 3553: h(3553)=3, h1=3+1=4 2125: h(2125)=5, h1=5+1=6 8232: h(8232)=2, h1=2+1=3, h2=2+4=6, h3=(2+9)%10=1
  • 62. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Hash table re-sizing When a hash table is full or nearly full, it requires re-sizing to increase the size of the hash table One of the methods is to take its first prime which is twice as large as the old table size For the previous table size 10  new table size is 23 and new hash function is h(key)=key%23 0 91 2 3 4 5 6 7 8 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 2 0 2 1 2 2 2 8 2 2 1 6 1 5 2 8 1 3 3 5 5 3 4 2 8 8 2 1 2 5 8 2 3 2
  • 63. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Load Factor To determine if a hash table is full or nearly full, load factor is used The value of the load factor is the ratio of number of elements (m) to the slots (n) of the table: m/n
  • 64. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Acceptable ranges of load factor For different addressing methods, the load factor has different acceptable ranges  Closed addressing (chaining): about 2 to 4 – if key values are well distributed in the table, it is expected that every linked list has one or more nodes than the load factor, i.e., every hit may require at most 4 to 6 visits  Open addressing: less than about 0.7 – it is the percentage of slots being occupied – a larger percentage may make a key to be rehashed many times – no more O(1)
  • 65. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Exercises Ford’s 12:15.a-b++ hf(x) = x, m=11, data: 1, 13, 12, 53, 77, 29, 31,22  a) Construct the hash table by using linear probe addressing  Construct the table again by using rehash function: index = (index + 5) % 11  b) Construct the hash table by using chaining with separate lists; and also  Determine the load factors of the tables.  Depict the hash table after resize, the one resulting from linear probing.
  • 66. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Hash Functions for integer data A hash function usually produces a non-negative value A common hash function of numeric data is simply hash(x) = abs(x) Ford’s: hash(x) = x2 / 256 % 65536
  • 67. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Hash Functions for real numbers Ford’s:  hash(x) = 0 if x = 0; otherwise  hashval = abs(2 * fabs(frexp(x,&exp)) -1); where frexp() is a C library function which is used to decompose num into two parts: a mantissa between 0.5 and 1 (returned by the function) and an exponent returned as exp; and scientific notation works like this: x = mantissa * (2 ^ exp) (Reference: www.cppreference.com) ICarnegie: hash(x) = floor(m * (frac(x * r)), where typically, r can be the Golden Ratio (sqrt(5) – 1)/2 and m is the table size
  • 68. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Hash functions for strings It is quite easy to think about converting each character to its ASCII value (65-90 and 97-122) and then accumulate its sum as the hash values – all permutations of a word hash to the same slot! The value of a character at different positions multiplies a factor then sums up the result – making a string similar to a number  when the factor is too small, it may not be significant  when the factor is too large, the resulting value would overflow – only the last few characters become accountable!
  • 69. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Hash Table vs BST  Timing for searching  Ideally, hash table has the complexity of O(1) while BST has a complexity of O(log n)  However, it may require more than O(log n) if many keys are clashed to the same slot. Even with the load factor, a hash table may maintain an optimal time in searching but it takes very much time when the hash table is required to re-size in order to maintain an acceptable load factor  Sequential scan and range scan  The in-order traversal on a BST is a sequential scan, and range scan is just a partial scan of the in-order traversal  Hash table does not easily support sequential scan on key values unless the hash function maintains the order of the key values – such a hash function may not distribute very well different key values into different slots
  • 70. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6  Coalesced hashing is a collision resolution method that uses pointers to connect the elements of a synonym chain. Coalesced Hashing • A hybrid of separate chaining and open addressing. • Linked lists within the hash table handle collisions. • This strategy is effective, efficient and very easy to implement.
  • 71. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6  Coalesced hashing obtains its name from what occurs when we attempt to insert a record with a home address that is already occupied by a record from a chain with a different home address. Coalesced Hashing This situation would occur, for example, if we attempted to insert a record with a home address of s into the hash table. What occurs is that the two chains with records having different home addresses coalesce or grow together.
  • 72. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6  In figure to the right, the records with keys X, D, and Y were inserted in the given order into the hash table. A, B, C, and D form one set of synonyms and X and Y form another set.  When X is inserted into the table with coalescing, it must be inserted as the end of the chain that it is coalescing with. Instead of needing only one probe to retrieve X, three are needed. The greater the coalescing the longer he probe chain will be, and as a result, retrieval performance will be degraded.  When record D is now added, it must be inserted at the end of the coalesced chains; we must move over record X from the other chain then to locate D. Coalesced Hashing Synonym chain: with coalescing (The shaded portion indicates portion of the chain in which coalescing has occurred, the thin line represents the insertions on the synonym chain with r as its home address. The thick line represents the insertions on the chain with s as its home address.)
  • 73. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Coalesced Hashing Coalesced hashing originated with Williams [1] and is also referred to as direct chaining. Algorithm for Coalesced Hashing
  • 74. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Hash Tables
  • 75. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Hash Tables Hash table:  Given a table T and a record x, with key (= symbol) and satellite data, we need to support: • Insert (T, x) • Delete (T, x) • Search(T, x)  We want these to be fast, but don’t care about sorting the records  In this discussion we consider all keys to be (possibly large) natural numbers
  • 76. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Direct Addressing Suppose:  The range of keys is 0..m-1  Keys are distinct The idea:  Set up an array T[0..m-1] in which • T[i] = x if x T and key[x] = i • T[i] = NULL otherwise  This is called a direct-address table • Operations take O(1) time!
  • 77. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 The Problem With Direct Addressing Direct addressing works well when the range m of keys is relatively small But what if the keys are 32-bit integers?  Problem 1: direct-address table will have 232 entries, more than 4 billion  Problem 2: even if memory is not an issue, the time to initialize the elements to NULL may be Solution: map keys to smaller range 0..m-1 This mapping is called a hash function
  • 78. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Hash Functions Next problem: collision T 0 m - 1 h(k1) h(k4) h(k2) = h(k5) h(k3) k4 k2 k3 k1 k5 U (universe of keys) K (actual keys)
  • 79. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Resolving Collisions How can we solve the problem of collisions? Solution 1: chaining Solution 2: open addressing
  • 80. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Open Addressing Basic idea  To insert: if slot is full, try another slot, …, until an open slot is found (probing)  To search, follow same sequence of probes as would be used when inserting the element • If reach element with correct key, return it • If reach a NULL pointer, element is not in table Good for fixed sets (adding but no deletion)  Example: spell checking Table needn’t be much bigger than n
  • 81. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Chaining Chaining puts elements that hash to the same slot in a linked list: —— —— —— —— —— —— T k4 k2 k3 k1 k5 U (universe of keys) K (actual keys) k6 k8 k7 k1 k4 —— k5 k2 k3 k8 k6 —— —— k7 ——
  • 82. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Chaining How do we insert an element? —— —— —— —— —— —— T k4 k2 k3 k1 k5 U (universe of keys) K (actual keys) k6 k8 k7 k1 k4 —— k5 k2 k3 k8 k6 —— —— k7 ——
  • 83. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Chaining —— —— —— —— —— —— T k4 k2 k3 k1 k5 U (universe of keys) K (actual keys) k6 k8 k7 k1 k4 —— k5 k2 k3 k8 k6 —— —— k7 —— How do we delete an element?  Do we need a doubly-linked list for efficient delete?
  • 84. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Chaining How do we search for a element with a given key? —— —— —— —— —— —— T k4 k2 k3 k1 k5 U (universe of keys) K (actual keys) k6 k8 k7 k1 k4 —— k5 k2 k3 k8 k6 —— —— k7 ——
  • 85. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Analysis of Chaining Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot Given n keys and m slots in the table: the load factor  = n/m = average # keys per slot What will be the average cost of an unsuccessful search for a key?
  • 86. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Analysis of Chaining Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot Given n keys and m slots in the table, the load factor  = n/m = average # keys per slot What will be the average cost of an unsuccessful search for a key? A: O(1+)
  • 87. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Analysis of Chaining Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot Given n keys and m slots in the table, the load factor  = n/m = average # keys per slot What will be the average cost of an unsuccessful search for a key? A: O(1+) What will be the average cost of a successful search?
  • 88. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Analysis of Chaining Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot Given n keys and m slots in the table, the load factor  = n/m = average # keys per slot What will be the average cost of an unsuccessful search for a key? A: O(1+) What will be the average cost of a successful search? A: O(1 + /2) = O(1 + )
  • 89. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Analysis of Chaining Continued So the cost of searching = O(1 + ) If the number of keys n is proportional to the number of slots in the table, what is ?  A:  = O(1)  In other words, we can make the expected cost of searching constant if we make  constant
  • 90. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Choosing A Hash Function Clearly choosing the hash function well is crucial  What will a worst-case hash function do?  What will be the time to search in this case? What are desirable features of the hash function?  Should distribute keys uniformly into slots  Should not depend on patterns in the data
  • 91. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Hash Functions: The Division Method h(k) = k mod m  In words: hash k into a table with m slots using the slot given by the remainder of k divided by m What happens to elements with adjacent values of k? What happens if m is a power of 2 (say 2P)? What if m is a power of 10? Upshot: pick table size m = prime number not too close to a power of 2 (or 10)
  • 92. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Hash Functions: The Multiplication Method For a constant A, 0 < A < 1: h(k) =  m (kA - kA)  What does this term represent?
  • 93. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Hash Functions: The Multiplication Method For a constant A, 0 < A < 1: h(k) =  m (kA - kA)  Choose m = 2P Choose A not too close to 0 or 1 Knuth: Good choice for A = (5 - 1)/2 Fractional part of kA
  • 94. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Hash Functions: Worst Case Scenario Scenario:  You are given an assignment to implement hashing  You will self-grade in pairs, testing and grading your partner’s implementation  In a blatant violation of the honor code, your partner: • Analyzes your hash function • Picks a sequence of “worst-case” keys, causing your implementation to take O(n) time to search What’s an honest CS student to do?
  • 95. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Hash Functions: Universal Hashing As before, when attempting to foil an malicious adversary: randomize the algorithm Universal hashing: pick a hash function randomly in a way that is independent of the keys that are actually going to be stored  Guarantees good performance on average, no matter what keys adversary chooses
  • 96. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6  Many suggestions have been made for reducing the coalescing of probe chains and thereby lowering the number of retrieval probes which in turn improves performance. The variants may be classified in three ways: Variants • The table organization (whether or not a separate overflow area is used). • The manner of linking a colliding item into a chain. • The manner of choosing unoccupied locations.
  • 97. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6  Coalescing may be reduced by modifying the table organization.  Instead of allocating the entire table space for both overflow records and home address records, the table is divided into a primary area and a overflow area. Primary Overflow (cellar) Variants • The primary area is the address space that the hash function maps into. • The overflow or cellar area contains only overflow records. • The address factor is the ratio of primary area to the total table size – Address Factor = primary area / total table size
  • 98. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6  For a fixed amount of storage, as the address factor decreases, the cellar size increases, which reduces the coalescing but because the primary area becomes smaller, it increases the number of collisions.  More collisions mean more items requiring multiple retrieval probes.  Vitter [2] determined that an address factor of 0.86 yields nearly optimal retrieval performance for most load factors. Variants
  • 99. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 LISCH  The algorithm given in slide 6 is called Late Insertion Standard Coalesced Hashing (LISCH) since new records are inserted at the end of a probe chain. [ The ‘Standard’ in the name refers to the lack of a cellar.  The variant of that algorithm that uses a cellar is called LICH, Late Insertion Coalesced Hashing. Variants
  • 100. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6  Another way of varying the insertion algorithm Changing the way in which we choose a unoccupied location. The unoccupied locations are always chosen from the bottom of the storage area. But the no. of collisions is increased in this way.  Hsaio [3] suggest REISCH (‘R’ stands for ‘Random’), in which a random unoccupied location for the new insertion is chosen. REISCH gives only 1% improvement over EISCH.  BLISCH (‘B’ signifies ‘Bidirectional’) is another method of choosing the overflow location for a collision insertion is to alternate the selection between the top and bottom of the table.  In DCWC (Direct Chaining Without Coalescing), a record not stored at its home address is moved. Variants
  • 101. Rossella Lau Lecture 10, DCO20105, Semester A,2005-6 Variants Table 1: Mean number of probes for successful lookup (n = 997) for variants of Coalesced Hashing