2024: Domino Containers - The Next Step. News from the Domino Container commu...
Â
Core Java Equals and hash code
1. Equals and Hash Code
Introduction
The Java super class java.lang.Object has two very important methods defined in it.
They are -
ďˇ public boolean equals(Object obj)
ďˇ public int hashCode()
These methods prove very important when user classes are confronted with other Java
classes, when objects of such classes are added to collections etc.
public boolean equals(Object obj)
This method checks if some other object passed to it as an argument is equal to the
object on which this method is invoked. The default implementation of this method in
Object class simply checks if two object references x and y refer to the same object.
i.e. It checks if x == y. This particular comparison is also known as "shallow
comparison". However, the classes providing their own implementations of
the equals method are supposed to perform a "deep comparison"; by actually
comparing the relevant data members. Since Object class has no data members that
define its state, it simply performs shallow comparison.
This is what the JDK API documentation says about the equals method
of Object class-
Indicates whether some other object is "equal to" this one.
The equals method implements an equivalence relation:
ďˇ It is reflexive: for any reference value x, x.equals(x) should return
true.
ďˇ It is symmetric: for any reference values x and y, x.equals(y) should
return true if and only if y.equals(x) returns true.
ďˇ It is transitive: for any reference values x, y, and z, if x.equals(y)
returns true and y.equals(z) returns true, then x.equals(z) should
return true.
ďˇ It is consistent: for any reference values x and y, multiple
invocations of x.equals(y) consistently return true or consistently
return false, provided no information used in equals comparisons on the
object is modified.
ďˇ For any non-null reference value x, x.equals(null) should return false.
2. The equals method for class Object implements the most discriminating
possible equivalence relation on objects; that is, for any reference values x
and y, this method returns true if and only if x and y refer to the same
object (x==y has the value true).
Note that it is generally necessary to override the hashCode method whenever
this method is overridden, so as to maintain the general contract for the
hashCode method, which states that equal objects must have equal hash codes.
The contract of the equals method precisely states what it requires. Once you
understand it completely, implementation becomes relatively easy, moreover it would
be correct. Let's understand what each of this really means.
1. Reflexive - It simply means that the object must be equal to itself, which it
would be at any given instance; unless you intentionally override
the equals method to behave otherwise.
2. Symmetric - It means that if object of one class is equal to another class object,
the other class object must be equal to this class object. In other words, one
object can not unilaterally decide whether it is equal to another object; two
objects, and consequently the classes to which they belong, must bilaterally
decide if they are equal or not. They BOTH must agree.
Hence, it is improper and incorrect to have your own class with equals method
that has comparison with an object of java.lang.String class, or with any other
built-in Java class for that matter. It is very important to understand this
requirement properly, because it is quite likely that a naive implementation
of equals method may violate this requirement which would result in undesired
consequences.
3. Transitive - It means that if the first object is equal to the second object and the
second object is equal to the third object; then the first object is equal to the
third object. In other words, if two objects agree that they are equal, and follow
the symmetry principle, one of them can not decide to have a similar contract
with another object of different class. All three must agree and follow
symmetry principle for various permutations of these three classes.
Consider this example - A, B and C are three classes. A and B both implement
the equals method in such a way that it provides comparison for objects of
class A and class B. Now, if author of class B decides to modify
its equals method such that it would also provide equality comparison with
class C; he would be violating the transitivity principle. Because, no
proper equals comparison mechanism would exist for class A and class C
objects.
4. Consistent - It means that if two objects are equal, they must remain equal as
long as they are not modified. Likewise, if they are not equal, they must remain
3. non-equal as long as they are not modified. The modification may take place in
any one of them or in both of them.
5. null comparison - It means that any instantiable class object is not equal
to null, hence the equals method must return false if a null is passed to it as an
argument. You have to ensure that your implementation of the equals method
returns false if a null is passed to it as an argument.
6. Equals & Hash Code relationship - The last note from the API
documentation is very important, it states the relationship requirement between
these two methods. It simply means that if two objects are equal, then they
must have the same hash code, however the opposite is NOT true. This is
discussed in details later in this article.
The details about these two methods are interrelated and how they should be
overridden correctly is discussed later in this article.
public int hashCode()
This method returns the hash code value for the object on which this method is
invoked. This method returns the hash code value as an integer and is supported for
the benefit of hashing based collection classes such as Hashtable, HashMap, HashSet
etc. This method must be overridden in every class that overrides the equals method.
This is what the JDK 1.4 API documentation says about the hashCode method
of Object class-
Returns a hash code value for the object. This method is supported for the
benefit of hashtables such as those provided by java.util.Hashtable.
The general contract of hashCode is:
ďˇ Whenever it is invoked on the same object more than once during an
execution of a Java application, the hashCode method must consistently
return the same integer, provided no information used in equals
comparisons on the object is modified. This integer need not remain
consistent from one execution of an application to another execution of
the same application.
ďˇ If two objects are equal according to the equals(Object) method, then
calling the hashCode method on each of the two objects must produce the
same integer result.
ďˇ It is not required that if two objects are unequal according to the
equals(java.lang.Object) method, then calling the hashCode method on
each of the two objects must produce distinct integer results. However,
the programmer should be aware that producing distinct integer results
for unequal objects may improve the performance of hashtables.
As much as is reasonably practical, the hashCode method defined by class
Object does return distinct integers for distinct objects. (This is typically
4. implemented by converting the internal address of the object into an integer,
but this implementation technique is not required by the JavaTM programming
language.)
As compared to the general contract specified by the equals method, the contract
specified by the hashCode method is relatively simple and easy to understand. It
simply states two important requirements that must be met while implementing
the hashCode method. The third point of the contract, in fact is the elaboration of the
second point. Let's understand what this contract really means.
1. Consistency during same execution - Firstly, it states that the hash code
returned by the hashCode method must be consistently the same for multiple
invocations during the same execution of the application as long as the object is
not modified to affect the equals method.
2. Hash Code & Equals relationship - The second requirement of the contract is
the hashCode counterpart of the requirement specified by the equals method. It
simply emphasizes the same relationship - equal objects must produce the same
hash code. However, the third point elaborates that unequal objects need
not produce distinct hash codes.
After reviewing the general contracts of these two methods, it is clear that the
relationship between these two methods can be summed up in the following statement
-
Equal objects must produce the same hash code as long as
they are equal, however unequal objects need not produce
distinct hash codes.
5. HashFunction:
Problem caused by hashCode()
The problem is caused by the un-overridden method "hashCode()". The contract
between equals() and hasCode() is that:
1. If two objects are equal, then they must have the same hash code.
2. If two objects have the same hashcode, they may or may not be equal.
The idea behind a Map is to be able to find an object faster than a linear search. Using
hashed keys to locate objects is a two-step process. Internally the Map stores objects
as an array of arrays. The index for the first array is the hashcode() value of the key.
This locates the second array which is searched linearly by using equals() to determine
if the object is found.
The default implementation of hashCode() in Object class returns distinct integers for
different objects. Therefore, in the example above, different objects(even with same
type) have different hashCode.
6. Hash Code is like a sequence of garages for storage, different stuff can be stored in
different garages. It is more efficient if you organize stuff to different place instead of the
same garage. So it's a good practice to equally distribute the hashCode value. (Not the
main point here though)
The solution is to add hashCode method to the class. Here I just use the color string's
length for demonstration.
How does java Hashmap work internally
What is Hashing?
Hashing in its simplest form, is a way to assigning a unique code for any variable/object after applying
any formula/algorithm on its properties. A true Hashing function must follow this rule:
Hash function should return the same hash code each and every time, when function is applied on same
or equal objects. In other words, two equal objects must produce same hash code consistently.
Note: All objects in java inherit a default implementation of hashCode() function defined in Object class.
This function produce hash code by typically converting the internal address of the object into an integer,
thus producing different hash codes for all different objects.
HashMap is an array of Entry objects:
Consider HashMap as just an array of objects.
Have a look what this Object is:
static class Entry<K,V> implements Map.Entry<K,V> {
final K key;
V value;
Entry<K,V> next;
final int hash;
...
}
Each Entry object represents key-value pair. Field next refers to other Entry object if a bucket has more
than 1 Entry.
Sometimes it might happen that hashCodes for 2 different objects are the same. In this case 2 objects
will be saved in one bucket and will be presented as LinkedList. The entry point is more recently added
7. object. This object refers to other object with next field and so one. Last entry refers to null.
When you create HashMap with default constructor
HashMap hashMap = new HashMap();
Array is gets created with size 16 and default 0.75 load balance.
Adding a new key-value pair
1. Calculate hashcode for the key
2. Calculate position hash % (arrayLength-1)) where element should be placed(bucket
number)
3. If you try to add a value with a key which has already been saved in HashMap, then value
gets overwritten.
4. Otherwise element is added to the bucket. If bucket has already at least one element - a new
one is gets added and placed in the first position in the bucket. Its next field refers to the old
element.
Deletion:
1. Calculate hashcode for the given key
2. Calculate bucket number (hash % (arrayLength-1))
8. 3. Get a reference to the first Entry object in the bucket and by means of equals method iterate
over all entries in the given bucket. Eventually we will find correct Entry. If desired element is
not found - return null
What put() method actually does:
Before going into put() methodâs implementation, it is very important to learn that instances of Entry class
are stored in an array.HashMap class defines this variable as:
/**
* The table, resized as necessary. Length MUST Always be a power of two.
*/
transient Entry[] table;
Now look at code implementation of put() method:
**
* Associates the specified value with the specified key in this map. If the
* map previously contained a mapping for the key, the old value is
* replaced.
*
* @param key
* key with which the specified value is to be associated
* @param value
* value to be associated with the specified key
* @return the previous value associated with <tt>key</tt>, or <tt>null</tt>
* if there was no mapping for <tt>key</tt>. (A <tt>null</tt> return
* can also indicate that the map previously associated
* <tt>null</tt> with <tt>key</tt>.)
*/
public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
for (Entry<k , V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(hash, key, value, i);
return null;
}
Lets note down the steps one by one:
Step1- First of all, key object is checked for null. If key is null, value is stored in table[0] position. Because
hash code for null is always 0.
Step2- Then on next step, a hash value is calculated using keyâs hash code by calling
its hashCode() method. This hash value is used to calculate index in array for storing Entry object. JDK
designers well assumed that there might be some poorly writtenhashCode() functions that can return
very high or low hash code value. To solve this issue, they introduced another hash() function, and
passed the objectâs hash code to this hash() function to bring hash value in range of array index size.
Step3- Now indexFor(hash, table.length) function is called to calculate exact index position for storing
the Entry object.
9. Step4- Here comes the main part. Now, as we know that two unequal objects can have same hash code
value, how two different objects will be stored in same array location [called bucket].
Answer is LinkedList. If you remember, Entry class had an attribute ânextâ. This attribute always points to
next object in chain. This is exactly the behavior of LinkedList.
So, in case of collision, Entry objects are stored in LinkedList form. When an Entry object needs to be
stored in particular index, HashMap checks whether there is already an entry?? If there is no entry
already present, Entry object is stored in this location.
If there is already an object sitting on calculated index, its next attribute is checked. If it is null, and current
Entry object becomes next node in LinkedList. If next variable is not null, procedure is followed until next
is evaluated as null.
What if we add the another value object with same key as entered before. Logically, it should replace the
old value. How it is done? Well, after determining the index position of Entry object, while iterating over
LinkedList on calculated index, HashMap calls equals method on key object for each Entry object. All
these Entry objects in LinkedList will have similar hash code but equals() method will test for true equality.
If key.equals(k) will be true then both keys are treated as same key object. This will cause the replacing of
value object inside Entry object only.
In this way, HashMap ensure the uniqueness of keys.
How get() methods works internally
Now we have got the idea, how key-value pairs are stored in HashMap. Next big question is : what
happens when an object is passed in get method of HashMap? How the value object is determined?
Answer we already should know that the way key uniqueness is determined in put() method , same logic
is applied in get() method also. The moment HashMap identify exact match for the key object passed as
argument, it simply returns the value object stored in current Entry object.
If no match is found, get() method returns null.
Let have a look at code
/**
* Returns the value to which the specified key is mapped, or {@code null}
* if this map contains no mapping for the key.
*
* <p>
10. * More formally, if this map contains a mapping from a key {@code k} to a
* value {@code v} such that {@code (key==null ? k==null :
* key.equals(k))}, then this method returns {@code v}; otherwise it returns
* {@code null}. (There can be at most one such mapping.)
*
* </p><p>
* A return value of {@code null} does not <i>necessarily</i> indicate that
* the map contains no mapping for the key; it's also possible that the map
* explicitly maps the key to {@code null}. The {@link #containsKey
* containsKey} operation may be used to distinguish these two cases.
*
* @see #put(Object, Object)
*/
public V get(Object key) {
if (key == null)
return getForNullKey();
int hash = hash(key.hashCode());
for (Entry<k , V> e = table[indexFor(hash, table.length)]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
return e.value;
}
return null;
}