Dic hash

Dictionaries
< 6

2 9
>
1 4 = 8

© 2004 Goodrich, Tamassia Dictionaries 1

Dictionary ADT
The dictionary ADT models a Dictionary ADT methods:
searchable collection of key-  findElement(k): if the
element items dictionary has an item with
The main operations of a key k, returns its element,
dictionary are searching, else, returns the special
element NO_SUCH_KEY
inserting, and deleting items  insertItem(k, o): inserts item
Multiple items with the same key (k, o) into the dictionary
are allowed  removeElement(k): if the
Applications: dictionary has an item with
 address book key k, removes it from the
 credit card authorization dictionary and returns its
element, else returns the
 mapping host names (e.g., special element
cs16.net) to internet addresses NO_SUCH_KEY
(e.g., 128.148.34.101)  size(), isEmpty()
 keys(), Elements()


Log File
A log file is a dictionary implemented by means of an unsorted
sequence
 We store the items of the dictionary in a sequence (based on a
doubly-linked lists or a circular array), in arbitrary order
Performance:
 insertItem takes O(1) time since we can insert the new item at the
beginning or at the end of the sequence
 findElement and removeElement take O(n) time since in the worst
case (the item is not found) we traverse the entire sequence to look
for an item with the given key
The log file is effective only for dictionaries of small size or for
dictionaries on which insertions are the most common
operations, while searches and removals are rarely performed
(e.g., historical record of logins to a workstation)


Lookup Table
A lookup table is a dictionary implemented by means of a sorted
sequence
 We store the items of the dictionary in an array-based sequence,
sorted by key
 We use an external comparator for the keys
Performance:
 findElement takes O(log n) time, using binary search
 insertItem takes O(n) time since in the worst case we have to shift
n/2 items to make room for the new item
 removeElement take O(n) time since in the worst case we have to
shift n/2 items to compact the items after the removal
The lookup table is effective only for dictionaries of small size or
for dictionaries on which searches are the most common
operations, while insertions and removals are rarely performed
(e.g., credit card authorizations)


Binary Search Tree
A binary search tree is An inorder traversal of a
a binary tree storing binary search trees
keys (or key-element visits the keys in
pairs) at its internal increasing order
nodes and satisfying
the following property:
 Let u, v, and w be three 6
nodes such that u is in
2 9
the left subtree of v and w
is in the right subtree of 1 4 8
v. We have
key(u) ≤ key(v) ≤ key(w)
External nodes do not
store items

Search
To search for a key k, Algorithm findElement(k, v)
we trace a downward if T.isExternal (v)
path starting at the root return NO_SUCH_KEY
if k < key(v)
The next node visited
return findElement(k, T.leftChild(v))
depends on the
else if k = key(v)
outcome of the
return element(v)
comparison of k with the
else { k > key(v) }
key of the current node
return findElement(k, T.rightChild(v))
If we reach a leaf, the
key is not found and we < 6
return NO_SUCH_KEY
2 9
Example: >
findElement(4) 1 4 = 8


Insertion
6
To perform operation <
insertItem(k, o), we search 2 9
>
for key k
1 4 8
Assume k is not already in >
the tree, and let let w be
the leaf reached by the w
search
6
We insert k at node w and
expand w into an internal 2 9
node
Example: insert 5 1 4 8
w
5


Deletion
6
To perform operation <
removeElement(k), we 2 9
search for key k >
1 4 v 8
Assume key k is in the tree,
w
and let let v be the node 5
storing k
If node v has a leaf child w,
we remove v and w from the
tree with operation 6
removeAboveExternal(w)
2 9
Example: remove 4
1 5 8


Deletion (cont.)
1
We consider the case where v
3
the key k to be removed is
stored at a node v whose 2 8
children are both internal 6 9
 we find the internal node w w
that follows v in an inorder 5
traversal z
 we copy key(w) into node v
 we remove node w and its 1
left child z (which must be a v
leaf) by means of operation 5
removeAboveExternal(z) 2 8
Example: remove 3 6 9


Performance
Consider a dictionary
with n items
implemented by means
of a binary search tree
of height h
 the space used is O(n)
 methods findElement ,
insertItem and
removeElement take
O(h) time
The height h is O(n) in
the worst case and
O(log n) in the best
case

Ordered Dictionaries
Keys are assumed to come from a total
order.
New operations:
 first(): first entry in the dictionary ordering
 last(): last entry in the dictionary ordering
 successors(k): iterator of entries with keys
greater than or equal to k; increasing order
 predecessors(k): iterator of entries with keys
less than or equal to k; decreasing order
© 2004 Goodrich, Tamassia Bucket-Sort and Radix-Sort 11

Hash Tables
0 ∅
1 025-612-0001
2 981-101-0002
3 ∅
4 451-229-0004

© 2004 Goodrich, Tamassia Hash Tables 12

Recall the Map ADT
Map ADT methods:
 get(k): if the map M has an entry with key k, return
its assoiciated value; else, return null
 put(k, v): insert entry (k, v) into the map M; if key k
is not already in M, then return null; else, return
old value associated with k
 remove(k): if the map M has an entry with key k,
remove it from M and return its associated value;
else, return null
 size(), isEmpty()
 keys(): return an iterator of the keys in M
 values(): return an iterator of the values in M


Hash Functions and
Hash Tables
A hash function h maps keys of a given type to integers
in a fixed interval [0, N − 1]
Example:
h(x) = x mod N
is a hash function for integer keys
The integer h(x) is called the hash value of key x

A hash table for a given key type consists of
 Hash function h

 Array (called table) of size N

When implementing a map with a hash table, the goal
is to store item (k, o) at index i = h(k)

Example
We design a hash table for 0 ∅
a map storing entries as 1 025-612-0001

(SSN, Name), where SSN 2 981-101-0002
3 ∅
(social security number) is a 4 451-229-0004
nine-digit positive integer

…
Our hash table uses an
array of size N = 10,000 and 9997 ∅
9998 200-751-9998
the hash function 9999 ∅
h(x) = last four digits of x


Hash Functions

A hash function is The hash code is
usually specified as the applied first, and the
compression function
composition of two
is applied next on the
functions: result, i.e.,
Hash code: h(x) = h2(h1(x))
h1: keys → integers The goal of the hash
function is to
Compression function:
“disperse” the keys in
h2: integers → [0, N − 1] an apparently random
way

Hash Codes
Memory address: Component sum:
 We reinterpret the memory  We partition the bits of
address of the key object as the key into components
an integer (default hash code
of fixed length (e.g., 16 or
of all Java objects)
32 bits) and we sum the
 Good in general, except for components (ignoring
numeric and string keys
overflows)
Integer cast:  Suitable for numeric keys
 We reinterpret the bits of the of fixed length greater
key as an integer than or equal to the
 Suitable for keys of length number of bits of the
less than or equal to the integer type (e.g., long
number of bits of the integer
and double in Java)
type (e.g., byte, short, int and
float in Java)


Hash Codes (cont.)
Polynomial accumulation: Polynomial p(z) can be
 We partition the bits of the evaluated in O(n) time
key into a sequence of
components of fixed length
using Horner’s rule:
(e.g., 8, 16 or 32 bits)  The following
a0 a1 … an−1 polynomials are
 We evaluate the polynomial successively computed,
p(z) = a0 + a1 z + a2 z2 + … each from the previous
… + an−1zn−1 one in O(1) time
at a fixed value z, ignoring p0(z) = an−1
overflows pi (z) = an−i−1 + zpi−1(z)
 Especially suitable for strings (i = 1, 2, …, n −1)
(e.g., the choice z = 33 gives
at most 6 collisions on a set of We have p(z) = pn−1(z)
50,000 English words)

Compression Functions
Division: Multiply, Add and
 h2 (y) = y mod N Divide (MAD):
 The size N of the  h2 (y) = (ay + b) mod N
hash table is usually  a and b are
chosen to be a prime nonnegative integers
 The reason has to do such that
with number theory a mod N ≠ 0
and is beyond the  Otherwise, every
scope of this course integer would map to
the same value b


Example (ideal) hash function
0 kiwi
Suppose our hash function 1

gave us the following values: 2 banana
hashCode("apple") = 5 3 watermelon
hashCode("watermelon") = 3
hashCode("grapes") = 8 4
hashCode("cantaloupe") = 7
hashCode("kiwi") = 0
5 apple
hashCode("strawberry") = 9 6 mango
hashCode("mango") = 6
hashCode("banana") = 2 7 cantaloupe
8 grapes
9 strawberry

© 2004 Goodrich, Tamassia

Collisions
When two values hash to the same array
location, this is called a collision
Collisions are normally treated as “first
come, first served”—the first value that
hashes to the location gets it
We have to find something to do with the
second and subsequent values that hash
to this same location

© 2004 Goodrich, Tamassia

Collision Handling
Collisions occur when 0 ∅
1 025-612-0001
different elements are 2 ∅
mapped to the same 3 ∅
cell 4 451-229-0004 981-101-0004

Separate Chaining:
let each cell in the Separate chaining is
table point to a linked simple, but requires
list of entries that map additional memory
there outside the table


Linear probing
A simple open addressing collision handling strategy
is called linear probing. In this if we try to insert an
item (k,e) into a bucket A[i] that is already occupied ,
where i=h(k), then we try next at A[(i+1)mod N]. If
A[(i+1)mod N] is occupied then we try at A[(i+2)mod
N] and so on, until we find the empty bucket in A that
can accept the new item.


Example
26,5,21,16,13,37
0 1 2 3 4 5 6 7 8 9 10
13 26 5 16 37 21

New element with key=15 to be inserted

0 1 2 3 4 5 6 7 8 9 10
13 26 5 16 37 15 21


Dic hash

More Related Content

What's hot

Similar to Dic hash

Recently uploaded

Dic hash

Editor's Notes