2. Table of Contents
1. Introduction.......................................................................................................................3
2. MD5 (Message-Digest 5)..............................................................................................3
2.1. Algorithm...................................................................................................................4
2.2. Applications................................................................................................................5
2.3. MD5 collision.............................................................................................................8
2.4. Implementation of MD5 in Java.................................................................................8
3. Attacks on MD5.................................................................................................................9
3.1. Rainbow tables...........................................................................................................9
3.2. Definition.................................................................................................................10
3.3.Time-Memory Trade-Off...........................................................................................10
3.4. How rainbow tables work?.......................................................................................10
3.5. An example for a reduction function........................................................................12
4. Conclusion.......................................................................................................................14
References............................................................................................................................15
3. 1 Introduction
Cryptographic hash functions are important primitives of
cryptographic techniques, which generate short-length strings from
arbitrary length input message.
Hash function has three fundamental properties:
It must be able to easily convert digital information (a message)
into a fixed length value.
It must be computationally impossible to derive any information
about the input message from just the hash.
It must be computationally impossible to find two files to have the
same hash.
2 MD5 (Message-Digest 5)
The MD5 hash function was developed in 1994, by cryptographer
Ronald Rivest as a stronger alternative to MD4 algorithm, developed
in 1992. The MD5 algorithm breaks a file into 512 bit input blocks.
Each block is run through a series of functions to produce a unique
128 bit hash value for the file. Changing just one bit in any of the
input blocks should have a cascading effect that completely alters
the hash results.
3
1
0
4. 2.1 Algorithm
MD5 processes a variable-length message into a fixed-length output
of 128 bits. The input message is broken up into chunks of 512-bit
blocks, the message is padded so that its length is divisible by 512.
The padding works as follows: first a single bit, 1, is appended to the
end of the message. This is followed by as many zeros as are
required to bring the length of the message up to 64 bits fewer than
a multiple of 512. The remaining bits are filled up with 64-bit integer
representing the length of the original message, in bits.
4
Figure 1. Hashing -a one way operation
5. The main MD5 algorithm operates on a 128-bit state, divided into four
32-bit words, denoted A, B, C and D. These are initialized to certain
fixed constants. The main algorithm then operates on each 512-bit
message block in turn, each block modifying the state. The processing of
a message block consists of four similar stages, termed rounds; each
round is composed of 16 similar operations based on a non-linear
function F. There are four possible functions, each one takes as input
three 32-bit words and produce as output on 32-bit word, a different one
is used in each round:
F(X,Y,Z) = (X ∧ Y) ∨ (⌉X ∧ Z)
G(X,Y,Z) = (X ∧ Z) ∨ (Y ∧ ⌉Z)
H(X,Y,Z) = Y ⊕ X ⊕ Z
I(X,Y,Z) = Y ⊕ (X ∨ ⌉Z)
2.2 Applications
The two main applications of MD5 are verifying file integrity and
encrypting passwords.
a) Verification of file integrity
MD5 digests have been widely used in the software world to
provide some assurance that the transferred file has arrives intact.
5
Figure 2. Length of message after padding (in bits)
d
6. For example, file servers often provide a pre-computed MD5
(known as Md5sum) checksum for the files, so that a user can
compare the checksum of the downloaded file to it. Unix-based
operating systems include MD5 sum utilities in their distribution
packages.
b) Passwords encryption
It's a bad idea for computer systems to store passwords in cleartext
(in their original form).
A more secure way is to store a hash of the password, rather than
the password itself. Since the hash functions are not reversible,
there is no way to find out “what password produced this hash?”
6
Figure 3. The file in the download page provided with a pre-computed MD5
Figure 4. Verification of file integrity
7. Now the password is squirreled into a safe place. During
authentication, when the user introduces his password, this
password runs through the same hash function and it is compared
with the hash saved in the password store. If they match, the access
is granted, but if the hashes are not identical, the access is denied.
7
Figure 6. Testing a proposed password against the stored hash
Figure 5. Storing a hash instead of a password
8. 2.3 MD5 collision
A collision is when there are two files with the same hash. The first
practical collisions on MD5 were found in 2004 by Wang et al. [1].
The detailed description of their collision attack on MD5 was given in
2005 [2].
2.4 Implementation of MD5 in Java
public static String hashPassword(String pwd) throws
NoSuchAlgorithmException {
MessageDigest md = MessageDigest.getInstance("MD5");
md.update(pwd.getBytes());
byte[] b = md.digest();
StringBuffer sb = new StringBuffer();
for(byte b1: b){
8
Figure 7. What might hash collision look like
9. sb.append(Integer.toHexString(b1 & 0xff).toString());
}
return sb.toString(); }
public static void main(String[] args) {
String password="Password";
System.out.println(password);
try{
System.out.println(hashPassword(password));
}catch(NoSuchAlgorithmException e){ }
}
The result
3 Attacks on MD5
3.1 Rainbow tables
Rainbow tables are the new generation of cracking, using advanced
methods for cracking passwords encrypted with algorithms such as
the Message-Digest 5 (MD5). Rainbow tables have become more
popular and more widely known for the speed at which passwords
encrypted with these algorithms can be cracked.
9
Figure 8. The result after running the class hashPassword
10. 3.2 Definition
A rainbow table is pre-computed tables for reversing cryptographic
hash functions, usually used for cracking password hashes. Tables
are usually used in recovering a plaintext password up to a certain
length consisting of a limited set of characters. A rainbow table
makes brute forcing a password hash much easier, by removing the
most computationally complicated part of a brute force performing
the hash function itself. With all of the values already computed, it's
simplified to just a simple search-and-compare operation on the
table.
3.3 Time-Memory Trade-Of
The traditional way to crack passwords is brute forcing, which would
simply just try all the plaintexts one by one. this was and still is a
time consuming method of cracking passwords. The implementation
of Philippe Oechslin's time-memory trade-off method of decreasing
time of cryptanalysis by using precalculated date stored in memory,
is being used in rainbow tables. The idea of time-memory trade-off is
to do all cracking time computation in advance and store the result
in files (rainbow-tables).
3.4 How rainbow tables work?
Rainbow tables use reduction and hash functions. Reduction
functions convert a hash value to a plaintext. The plaintext is not the
original plaintext from which the hash value was generated, but
another one. By alternating the hash function with the reduction
function, chains of alternating passwords and hash values are
formed. Only the first (chain's start point) and last plaintext (chain's
end point) generated are stored in the table. To decipher a hashed
password, we first process the hashed password through reduction
functions until we find a match to a chain's end point. We then take
10
11. that chain's corresponding start point and regenerate the hash chain
and find the original plaintext to the hashed password.
There are two simple methods to find a given plaintext:
Hash each plaintext one by one, until we find the hash.
Hash each plaintext one by one, but store each generated hash
in a stored table so that we can easily look the hash up later
without generating the hashes again.
11
Figure 9. Simplified rainbow table with 3 reduction functions
Figure 10. A hash function maps plaintexts to hashes, the
reduction function maps hashes to plaintexts
12. 3.5 An example for a reduction function:
The only requirement for the reduction function is to be able to
return a "plain text" value in a specific size.
To generate the table, we choose a random set of initial passwords
from P, compute chains of some fixed length k for each one, and store
only the first and last password in each chain. The first password is
called the starting point and the last one is called the endpoint. In
the example chain below, "aaaaaa" would be the starting point and
"kiebgt" would be the endpoint, and none of the other passwords (or
the hash values) would be stored.
Now, given a hash value h that we want to invert (find the
corresponding password for), compute a chain starting with h by
applying R, then H, then R, and so on. If at any point we observe a
value matching one of the endpoints in the table, we get the
corresponding starting point and use it to recreate the chain. There's
a good chance that this chain will contain the value h, and if so, the
immediately preceding value in the chain is the password p that we
seek.
For example, if we're given the hash 920ECF10, we would compute
its chain by first applying R:
Since "kiebgt" is one of the endpoints in our table, we then take the
corresponding starting password "aaaaaa" and follow its chain until
920ECF10 is reached:
12
13. Thus, the password is "sgfnyd".
Note however that this chain does not always contain the hash value
h; it may so happen that the chain starting at h merges with the
chain starting at the starting point at some point after h. For
example, we may be given a hash value FB107E70, and when we
follow its chain, we get kiebgt:
But FB107E70 is not in the chain starting at "aaaaaa". This is called
a false alarm. In this case, we ignore the match and continue to
extend the chain of h looking for another match. If the chain of h
gets extended to length k with no good matches, then the password
was never produced in any of the chains.
The table content does not depend on the hash value to be inverted.
It is created once and then repeatedly used for the lookups
unmodified. Increasing the length of the chain decreases the size of
the table. It also increases the time required to perform lookups, and
this is the time-memory trade-off of the rainbow table. In a simple
case of one-item chains, the lookup is very fast, but the table is very
big. Once chains get longer, the lookup slows down, but the table
size goes down.
Simple hash chains have several flaws. Most serious if at any point
two chains collide (produce the same value), they will merge and
consequently the table will not cover as many passwords despite
having paid the same computational cost to generate. Because
previous chains are not stored in their entirety, this is impossible to
detect efficiently. For example, if the third value in chain 3 matches
the second value in chain 7, the two chains will cover almost the
13
14. same sequence of values, but their final values will not be the same.
The hash function H is unlikely to produce collisions as it is usually
considered an important security feature not to do so, but the
reduction function R, because of its need to correctly cover the likely
plaintexts, can not be collision resistant.
Other difficulties result from the importance of choosing the correct
function for R. Picking R to be the identity is little better than a brute
force approach. Only when the attacker has a good idea of what the
likely plaintexts will be he or she can choose a function R that makes
sure time and space are only used for likely plaintexts, not the entire
space of possible passwords. In effect R shepherds the results of
prior hash calculations back to likely plaintexts but this benefit
comes with drawback that R likely won't produce every possible
plaintext in the class the attacker wishes to check denying certainty
to the attacker that no passwords came from his chosen class. Also it
can be difficult to design the function R to match the expected
distribution of plaintexts.
4 Conclusion
Password storage security is one important aspect of data security as
most systems nowadays require an authentication method using
passwords. Hashing algorithms such as MD5 are commonly used for
encrypting plaintext passwords into strings that theoretically cannot
be deciphered by hackers due to their one-way encryption feature.
However, with time, attacks became possible through the use of
rainbow tables. This made people distrusting the strength of MD5
algorithm. But there were improvements on MD5 processing by
adding salt value, which makes passwords more resistant to rainbow
tables as the salted hashed password will have higher information
entropy and hence much less likely to exist in pre-computed rainbow
tables.
14
15. References
[1] Xiaoyun Wang, Dengguo Feng, Xuejia Lai, and Hongbo Yu. Collisions for
Hash Functions MD4, MD5, HAVAL-128 and RIPEMD. Cryptology ePrint
Archive, Report 2004/199, 2004. http://eprint.iacr.org/.
[2] Xiaoyun Wang and Hongbo Yu. How to Break MD5 and Other
Hash Functions. In Ronald Cramer, editor, Advances in Cryptology-
EUROCRYPT 2005, volume 3494 of Lecture Notes in Computer
Science, pages 19–35. Springer, 2005.
[3] Rivest, R., The MD4 Message Digest Algorithm, RFC 1320, MIT and
RSA Data Security, Inc., April 1992.
[4] Mary Cindy Ah Kioon, Zhao Shun Wang and Shubra Deb Das.
Security Analysis of MD5 algorithm in Password Storage, 2013,
pages 4.
[5] Praveen Gauravaram, Adrian McCullagh and ED Dawson.
Collision Attacks on MD5 and SHA-1: Is this the “Sword of
Damocles” for Electronic Commerce?, 2006, pages 73-88.
[6] WarpBoy. Rainbow tables explained, 2006, pages 11.
source URL: https://en.wikipedia.org/wiki/Rainbow_table, April 16th
,
2016, visited 18/04/2016.
15