Searching and Hashing
•Hash Table
• Hash Functions
• Collision Resolution Strategies
• Hash Table Implementation
Hashing
• The searching time of Linear and binary searching techniques
depends on the number of elements.
• Hashing is a search technique, its searching time does not depend
on the number of elements.
• Hashing technique is a search technique in which the required
record is located by using a function. Search time is independent
of the position of the record in the file. The function used to
locate the record is called the hash function.
• A hash function h transforms a key K into a table index L at which
the record with key K is placed and h(K) is called the hash of key
K.
h(K) = k  L
• Hash functions
• The main criteria for the selection of a hash function are
– it should be easy and quick to compute
– should produce an even distribution of keys across the range of indices
– should produce distinct indices
Hashing• There are several basic methods that can be used to
build a hash function.
• Division
An integer key is divided by the table size and the
remainder is taken as the hash value.
Hash value = (key) mod (table_size)
or
Hash value = (key) mod (table_size) + 1
The second one starts the hash value from 1 instead of 0
Best hash values are obtained when table_size is a
prime.
Truncation
Part of the key is ignored and the remaining portion is
used as the index. The method, though simple, fails to
give uniform distribution.
Ex: Given a key of seven digits, then the first, fourth and
seventh digits can make hash function so that the key
2345678 maps to 258.
Hashing
• Folding
The key is divided into several parts and the parts are
combined in a convenient way to get the index. Often
addition or multiplication is used for combining the parts.
This process, termed folding, makes use of all the
information in the key and hence can produce better
distribution of the indices.
Ex: Given a key of seven digits can be divided into groups of
three, two and two digits, the groups are added and the
result according to requirement can be used as such or
processed further.
2345678 maps to 234+56+78 = 368.
• Midsquare method
The key is multiplied by itself and the middle few digits of
the square is taken as index. The number of middle digits to
be taken is dependent on the number of digits allowed in
the index. Since the middle digits of a square is dependent
on all the digits in the key, the chances of keys hashing into
same indices are expected to be small.
Hashing
• Hash collision
• Given a set of keys k1, k2, ….kn a perfect hash function is defined as
one wherein hash-value of ki is not equal to hash value of kj for all
distinct i and j.
• Some times more than one distinct keys give the same hash value.
This is called hash collision or hash clash. This situation is resolved
in several ways.
• Linear probing or linear open
• The simplest method of resolving hash clashes is to search the table
sequentially for the desired key or the empty location. The search is
started from the location the collision occurs. The colliding record is
placed in the next available space. The storage space is considered
as a circular linear space so that when the last location is reached
the search goes to the first location. The method is called linear
probing because of the linear nature of searching.
Hashing
Hashing
• Rehashing or double hashing
• In the method called rehashing a secondary
hash function is used on the hash key. The
hash value is used as input to the rehash
function and a new hash value is computed.
The rehash function is used successively until
a distinct hash value is resulted.
Hashing
6 6
Put on 3rd pos.
from 5th pos.
Put on 5th pos.
from 6th pos.
Hashing
• Quadratic probing
• This approach tries to correct the clustering problem
of linear probing by introducing a quadratic
increment function. Probing is done at locations
given by
( Hash value + j2 ) mod (table_ size) with
j=1,2,3………..
• Quadratic probing reduces clustering considerably
but all the locations are not probed by this method.
When table_size is a prime almost half of the
locations are probed. But if the table_size is a power
of two, relatively few locations are probed.
Hashing
Hashing
• Hashing with buckets
• In this approach multiple keys are hashed to a single
location. The locations are slotted to contain more than
one key. Each of this multi-key location is called a
bucket. Each of these buckets can hold multiple entries
up to a point. This approach allows multiple entries to
hash at the same location. When the bucket is full
collisions are to be handled again.
• Chaining
• In this method called chaining, a linked list of all items
whose key hash into the same value is built. During the
search hash function is first applied to the key and then
the linked list, called chain, is searched sequentially for
the target key. In this technique an extra link field is
added to each table position.
Hashing
Hashing
• There are several advantages by this approach.
• Considerable space is saved when the records are large.
Since hash table is an array and the array space is allocated
at the time of compilation, considerable amount of space is
wasted if some array elements are not occupied. As the
space required for pointers are small, the space wasted will
not be much even if the space allocated remains empty.
• Adding a link to the record and organizing all the records
with a single hash address as a linked list handle collision.
Good hash function will give short linked list enabling quick
search. Clustering is prevented as keys with distinct hash
addresses go to different lists.
• The average length of the linked lists remain small and the
efficiency of the sequential search of the lists is maintained.
• Deletion becomes easy and quick in chained hash table.
Hashing
• There are disadvantages also in the chained hash
table method.
• When the records are small, the space used for
links becomes considerable in comparison with
the space required for storing the records.
• When the hash table is small, there would be
collisions making some of the chains long. This
slows down searching
• However, a good hash function minimizes the
collision and spreads the records uniformly
throughout the file. Larger the range of hash
functions less chances of hash clashes. This
involves the trade-offs between time and space.
Hashing
• Hashing facilitates direct access to a table. For
this reason this scheme is preferable to other
search techniques. The biggest draw back in this
scheme is that the records in a hash table are
not stored in the sorted order of keys.
• They do not minimize hash collisions and hence
cannot access any record directly from its key
thus defeating the basic purpose of hashing.
• In view of speed the hash methods compare
better than other search methods when the size
of the file is large.

Hashing

  • 1.
    Searching and Hashing •HashTable • Hash Functions • Collision Resolution Strategies • Hash Table Implementation
  • 2.
    Hashing • The searchingtime of Linear and binary searching techniques depends on the number of elements. • Hashing is a search technique, its searching time does not depend on the number of elements. • Hashing technique is a search technique in which the required record is located by using a function. Search time is independent of the position of the record in the file. The function used to locate the record is called the hash function. • A hash function h transforms a key K into a table index L at which the record with key K is placed and h(K) is called the hash of key K. h(K) = k  L • Hash functions • The main criteria for the selection of a hash function are – it should be easy and quick to compute – should produce an even distribution of keys across the range of indices – should produce distinct indices
  • 3.
    Hashing• There areseveral basic methods that can be used to build a hash function. • Division An integer key is divided by the table size and the remainder is taken as the hash value. Hash value = (key) mod (table_size) or Hash value = (key) mod (table_size) + 1 The second one starts the hash value from 1 instead of 0 Best hash values are obtained when table_size is a prime. Truncation Part of the key is ignored and the remaining portion is used as the index. The method, though simple, fails to give uniform distribution. Ex: Given a key of seven digits, then the first, fourth and seventh digits can make hash function so that the key 2345678 maps to 258.
  • 4.
    Hashing • Folding The keyis divided into several parts and the parts are combined in a convenient way to get the index. Often addition or multiplication is used for combining the parts. This process, termed folding, makes use of all the information in the key and hence can produce better distribution of the indices. Ex: Given a key of seven digits can be divided into groups of three, two and two digits, the groups are added and the result according to requirement can be used as such or processed further. 2345678 maps to 234+56+78 = 368. • Midsquare method The key is multiplied by itself and the middle few digits of the square is taken as index. The number of middle digits to be taken is dependent on the number of digits allowed in the index. Since the middle digits of a square is dependent on all the digits in the key, the chances of keys hashing into same indices are expected to be small.
  • 5.
    Hashing • Hash collision •Given a set of keys k1, k2, ….kn a perfect hash function is defined as one wherein hash-value of ki is not equal to hash value of kj for all distinct i and j. • Some times more than one distinct keys give the same hash value. This is called hash collision or hash clash. This situation is resolved in several ways. • Linear probing or linear open • The simplest method of resolving hash clashes is to search the table sequentially for the desired key or the empty location. The search is started from the location the collision occurs. The colliding record is placed in the next available space. The storage space is considered as a circular linear space so that when the last location is reached the search goes to the first location. The method is called linear probing because of the linear nature of searching.
  • 6.
  • 7.
    Hashing • Rehashing ordouble hashing • In the method called rehashing a secondary hash function is used on the hash key. The hash value is used as input to the rehash function and a new hash value is computed. The rehash function is used successively until a distinct hash value is resulted.
  • 8.
    Hashing 6 6 Put on3rd pos. from 5th pos. Put on 5th pos. from 6th pos.
  • 9.
    Hashing • Quadratic probing •This approach tries to correct the clustering problem of linear probing by introducing a quadratic increment function. Probing is done at locations given by ( Hash value + j2 ) mod (table_ size) with j=1,2,3……….. • Quadratic probing reduces clustering considerably but all the locations are not probed by this method. When table_size is a prime almost half of the locations are probed. But if the table_size is a power of two, relatively few locations are probed.
  • 10.
  • 11.
    Hashing • Hashing withbuckets • In this approach multiple keys are hashed to a single location. The locations are slotted to contain more than one key. Each of this multi-key location is called a bucket. Each of these buckets can hold multiple entries up to a point. This approach allows multiple entries to hash at the same location. When the bucket is full collisions are to be handled again. • Chaining • In this method called chaining, a linked list of all items whose key hash into the same value is built. During the search hash function is first applied to the key and then the linked list, called chain, is searched sequentially for the target key. In this technique an extra link field is added to each table position.
  • 12.
  • 13.
    Hashing • There areseveral advantages by this approach. • Considerable space is saved when the records are large. Since hash table is an array and the array space is allocated at the time of compilation, considerable amount of space is wasted if some array elements are not occupied. As the space required for pointers are small, the space wasted will not be much even if the space allocated remains empty. • Adding a link to the record and organizing all the records with a single hash address as a linked list handle collision. Good hash function will give short linked list enabling quick search. Clustering is prevented as keys with distinct hash addresses go to different lists. • The average length of the linked lists remain small and the efficiency of the sequential search of the lists is maintained. • Deletion becomes easy and quick in chained hash table.
  • 14.
    Hashing • There aredisadvantages also in the chained hash table method. • When the records are small, the space used for links becomes considerable in comparison with the space required for storing the records. • When the hash table is small, there would be collisions making some of the chains long. This slows down searching • However, a good hash function minimizes the collision and spreads the records uniformly throughout the file. Larger the range of hash functions less chances of hash clashes. This involves the trade-offs between time and space.
  • 15.
    Hashing • Hashing facilitatesdirect access to a table. For this reason this scheme is preferable to other search techniques. The biggest draw back in this scheme is that the records in a hash table are not stored in the sorted order of keys. • They do not minimize hash collisions and hence cannot access any record directly from its key thus defeating the basic purpose of hashing. • In view of speed the hash methods compare better than other search methods when the size of the file is large.