A hash function H is a transformation that takes an input m and returns a fixed-size string, which is called the hash value h (that is, h = H(m)). Hash functions with just this property have a variety of general computational uses, but when employed in cryptography, the hash functions are usually chosen to have some additional properties. 7
The basic requirements for a cryptographic hash function are as follows. ◦ The input can be of any length. ◦ The output has a fixed length. ◦ H(x) is relatively easy to compute for any given x. ◦ H(x) is one-way. ◦ H(x) is collision-free. 8
A hash function H is said to be one-way if it is hard to invert, where ``hard to invert means that given a hash value h, it is computationally infeasible to find some input x such that H(x) = h. 9
The hash value represents concisely the longer message or document from which it was computed; this value is called the message digest. One can think of a message digest as a ``digital fingerprint of the larger document. Examples of well known hash functions are MD2 and MD5 and SHA 10
Damgard and Merkle greatly influenced cryptographic hash function design by defining a hash function in terms of what is called a compression function. A compression function takes a fixed-length input and returns a shorter, fixed-length output. Given a compression function, a hash function can be defined by repeated applications of the compression function until the entire message has been processed. 11
In this process, a message of arbitrary length is broken into blocks whose length depends on the compression function, and “padded” (for security reasons) so the size of the message is a multiple of the block size. The blocks are then processed sequentially, taking as input the result of the hash so far and the current message block, with the final output being the hash value for the message. 12
The following five steps are performed to compute the message digest of the message. Step 1. Append Padding Bits Step 2. Append Length Step 3. Initialize MD Buffer Step 4. Process Message in 16-Word Blocks Step 5. Output 13
The message is "padded" (extended) so that its length (in bits) is congruent to 448, modulo 512. That is, the message is extended so that it is just 64 bits shy of being a multiple of 512 bits long. Padding is always performed, even if the length of the message is already congruent to 448, modulo 512. Padding is performed as follows: a single "1" bit is appended to the message, and then "0" bits are appended so that the length in bits of the padded message becomes congruent to 448, modulo 512. In all, at least one bit and at most 512 bits are appended. 14
A 64-bit representation of b (the length of the message before the padding bits were added) is appended to the result of the previous step. In the unlikely event that b is greater than 2^64, then only the low-order 64 bits of b are used. (These bits are appended as two 32-bit words and appended low-order word first in accordance with the previous conventions.) 15
A four-word buffer (A,B,C,D) is used to compute the message digest. Here each of A, B, C, D is a 32-bit register. These registers are initialized to the following values in hexadecimal, low- order bytes first): 16
The functions G, H, and I are similar to the function F, in that they act in "bitwise parallel" to produce their output from the bits of X, Y, and Z, in such a manner that if the corresponding bits of X, Y, and Z are independent and unbiased, then each bit of G(X,Y,Z), H(X,Y,Z), and I(X,Y,Z) will be independent and unbiased. Note that the function H is the bit-wise "xor" or "parity" function of its inputs. 23
This step uses a 64-element table T[1 ... 64]constructed from the sine function. Let T[i] denotethe i-th element of the table, which is equal to theinteger part of 4294967296 times abs(sin(i)), wherei is in radians. The elements of the table are givenin the following slide. 24
2004: First MD5 collision attack ◦ Only difference between messages in random looking 128 collision bytes ◦ Currently < 1 second on PC MD5( ) = MD5( )
Attack scenarios ◦ Generate specific collision blocks ◦ Use document format IF…THEN…ELSE ◦ Both payloads present in both files ◦ Colliding PostScript files with different contents ◦ Similar examples with other formats: DOC, PDF ◦ Colliding executables with different execution flows
2007: Stronger collision attack ◦ Chosen-Prefix Collisions ◦ Messages can differ freely up to the random looking 716 collision bytes ◦ Currently approx. 1 day on PS3+PC MD5( ) = MD5( )
Second generation attack scenarios ◦ Using chosen-prefix collisions ◦ No IF…THEN…ELSE necessary Each file contains single payload instead of both Collision blocks not actively used in format ◦ Colliding executables Malicious payload cannot be scanned in harmless executable ◦ Colliding documents (PDF, DOC, …) Collision blocks put inside hidden raw image data
Certificates with colliding to-be-signed parts ◦ generate a pair of certificates ◦ sign the legitimate certificate ◦ copy the signature into the rogue certPrevious work ◦ Different RSA public keys in 2005 using 2004 collision attack ◦ Different identities in 2006 using chosen-prefix collisions the theory is well known since 2007
set by serial number serial numberthe CA validity period validity period chosen prefix (difference) real cert rogue cert domain name domain name real cert real cert RSA key RSA key collision bits (computed) X.509 extensions identical bytes X.509 extensions (copied from real cert) signature signature
◦ We collected 30,000 website certificates 9,000 of them were signed with MD5 97% of those were issued by RapidSSL◦ CAs still using MD5 in 2008: RapidSSL FreeSSL TrustCenter RSA Data Security Thawte verisign.co.jp
◦ RapidSSL uses a fully automated system◦ The certificate is issued exactly 6 seconds after we click the button and expires in one year.
◦ Remote counter increases only when people buy certs we can do a query-and-increment operation at a cost of buying one certificate◦ Cost $69 for a new certificate renewals are only $45 up to 20 free reissues of a certificate $2.25/query-and-increment operation
1. Get the serial number S on Friday2. Predict the value for time T on Sunday to be S+10003. Generate the collision bits4. Shortly before time T buy enough certs to increment the counter to S+9995. Send colliding request at time T and get serial number S+1000
Based on the 2007chosen-prefix collisionspaper with newimprovements1-2 days on a cluster of200 PlayStation 3’sEquivalent to 8000desktop CPU cores or$20,000 on Amazon EC2
serial number validity period rogue CA cert chosen prefix (difference) rogue CA RSA keyreal cert domain name rogue CA X.509 CA bit! extensions real cert collision bits Netscape Comment RSA key (computed) Extension (contents ignored by browsers)X.509 extensions identical bytes (copied from real cert) signature signature
◦ 3 failed attempts problems with timing other CA requests stealing our serial number◦ Finally success on the 4th attempt!◦ Total cost of certificates: USD $657
◦ We’re not releasing the private key◦ Our CA cert was backdated to Aug 2004 just for demo purposes, a real malicious attacker can get a cert that never expires◦ Browser vendors can blacklist our cert we notified them in advance◦ Users might be able to blacklist our cert
Our CA cert is not easily revocable! ◦ CRL and OCSP get the revocation URL from the cert itself ◦ Our cert contains no such URL ◦ Revocation checking is disabled in Firefox 2 and IE6 anywaysPossiblefixes: Large organizations can set uptheir own custom OCSP server and force OCSPrevocation checking.
Extended Validation (EV) certs: ◦ supported by all major browsers ◦ EV CAs are not allowed to use MD5 ◦ safe against this attackDo users really know how to tell the differencebetween EV and regular certs?
With optimizations the attack might be donefor $2000 on Amazon EC2 in 1 dayWe want to prevent malicious entities fromrepeating the attack: ◦ We are not releasing our collision finding implementation or improved methods until we feel it’s safe ◦ We’ve talked to the affected CAs: they will switch to SHA-1 very, very soon
No way to tell. ◦ The theory has been public since 2007 ◦ Our legitimate certificate is completely innocuous, the collision bits are hidden in the RSA key, but they look randomCan we still trust CA certs that have been usedto sign anything with MD5 in the last few years?
◦ We need defense in depth random serial numbers random delay when signing certs◦ Future challenges: second preimage against MD5 collisions in SHA-1◦ Dropping support for a broken crypto primitive is very hard in practice but crypto can be broken overnight what do we do if SHA-1 or RSA falls tomorrow?
◦ No need to panic, the Internet is not completely broken◦ The affected CAs are switching to SHA-1◦ Making the theoretical possible is sometimes the only way you can affect change and secure the Internet