Collision With Hashes

The Legal and Practical Implications of the
Recent Attacks on 128-bit Cryptographic Hash

Functions
Praveen Gauravaram, Adrian McCullagh and Ed Dawson

Information Security Research Centre,
Queensland University of Technology,
GPO BOX 2434 (126 Margaret Street),
Brisbane, QLD, 4001, Australia
p.subramanya,a.mccullagh,e.dawson@qut.edu.au
Abstract. This paper will discuss the legal and practical implications
of the attacks, presented at Crypto’2004, against the various 128-bit hash
functions and in particular MD5 due to its wide usage. It will further
be discussed, the significance of these attacks on a number of important
applications where MD5 is a primary function in the application. Further
it is argued in this paper that the MD-x style of hash function designs
for various applications can be a single point of failure for the usage
of hash functions in cryptography and that new hash function design
schemes with some strict security properties should be developed in order
to avoid the current and possible new attacks on those future family of
hash function designs.
1 Introduction
Cryptographic hash functions serve an essential role within a wide range

of information security applications. These security applications provide
certain security services such as data integrity, authentication and non-
repudiation. These applications also include non-exhaustively, digital sig-
nature generation and verification, session key establishment in key agree-
ment protocols, management of password schemes and commitment schemes
in cryptographic protocols such as Electronic Auctions.
Hash functions, also called message-digest algorithms, compress an ar-
bitrary finite m-bit input message x, without the provision of any secret
parameter, into a fixed n-bit output message y. y is called the hash value
or message digest. A Hash function can be expressed as h(x) = y where h
is a publicly known hash function and the computation of y from x must
be easy. Hash functions are well discussed in the classic text book “Hand-
book of Applied Cryptography” [31] and in the doctoral dissertation of
Bart Preneel [36].
The principal security properties of hash functions are:
– pre-image resistance: A hash function h is said to be pre-image re-

sistant if for a given hash value y, it is “computationally infeasible”
or “hard” to find an input message x with only y. In other words, it
should be hard to invert the hash function.
– 2nd pre-image resistance: A hash function h is said to be 2 nd pre-image
resistant if for a given message x and its corresponding hash value y,
it is hard to find another input message x 0 such that h(x0 ) = h(x) = y.
– collision resistance: A hash function h is said to be collision resistant
if it is hard to find any two input messages x and x 0 such that h(x) =
h(x0 ).
– Near-collision resistance: A hash function h is said to be near collision
resistant if it is hard to find any two inputs x and x 0 such that the
difference between h(x) and h(x0 ) (h(x)⊕h(x0 )) is small. This property
serves as a certificational property for hash functions [31].
A hash algorithm that satisfies the first two properties mentioned

above is said to be a “one-way” hash function (OWHF); whereas a hash
algorithm that satisfies the first three properties is said to be a “collision
resistant” hash function (CRHF) [36] (In reality due to some technical
reasons pre-image resistance is not a compulsory requirement for a hash
function to be classified as CRHF [31]). The practical security of any hash
function lies in its output bit size to prevent Yuval’s Birthday attack to
find collisions [42]. Collisions in hash functions are easier to find than
finding pre-images or 2nd pre-images. It requires 2n/2 computations to find
a collision for an n-bit hash function according to the Birthday attack;
whereas it requires 2n effort to find either pre-images or 2nd pre-images
by the brute-force technique [31].
The attacks on the 128-bit hash functions MD4, MD5, RIPEMD and
HAVAL-128 presented at Crypto’2004 [41] have established that it is no
longer secure to use any of these four hash functions for various infor-
mation processing applications where the collision resistance property is
relevant. The attacks clearly show that these hash functions can no longer
be considered as CRHFs. The security properties expected from the hash
functions vary as per the application in which they are used as discussed
in Section 4.
The rest of the paper is organised as follows: Section 2 gives a brief
working procedure of hash functions, Section 3 explains the attacks on
these algorithms and their outcomes, Section 4 discusses the significance
of these attacks on some important applications, Section 5 emphasises
2
the necessity for new hash function designs and finally Section 6 provides
some concluding remarks.
2 Operation of Hash Functions
Hash functions that belong to the MD-x family are based on the
Merkle/Damgard style of iterative structure [13, 32] as shown in Fig 1.
An arbitrary finite size input message, which has to be hashed, is split
into blocks of equal size depending on the algorithm used, as detailed in
Table 1. A predefined reversible padding procedure is employed on the last
block to make its size equal to that of remaining blocks. This is necessary
for certain security reasons [36]. Blocks are processed one by one by the
repeated application of a compression function as shown in Fig 1 and
the hash value y is obtained after processing all of the blocks comprising
the messages including the last padded block. Hash Functions use a fixed
constant at the start of their operation which is called the Initial Value
(IV). The bit-length of the IV will equal the bit-length of the hash output
and the IV becomes a chaining variable as the IV’s value is updated with
each repeated iteration of the compression function as applied to the
message blocks. This chaining variable at the end of the process becomes
the message digest or hash value y. Table 1 shows various hash functions
that are based on Merkle/Damgard iterative structure with their block
sizes, number of compression function iterations and output hash value
sizes in bit-length.
LAST
Message Block 1 Message Block 2 PADDING
BLOCK
COMPRESSION COMPRESSION COMPRESSION

INITIAL VALUE HASH
FUNCTION FUNCTION FUNCTION
Fig. 1. Merkle/Damagard Iterative Structure for Hash Functions
3 Details of the Attacks
The attacks on MD4, MD5, RIPEMD and HAVAL-128 presented at

Crypto’2004 by Xiaoyun Wang et al show that these hash algorithms can
3
Table 1. Various cryptographic hash functions
Hash Function Block size(bits) CF Iterations Hash Value(bit-length)

MD4 512 48 128
MD5 512 64 128
RIPEMD 512 64 128
RIPEMD-128 512 64 128
RIPEMD-160 512 80 160
RIPEMD-256 512 64 256
RIPEMD-320 512 80 320
SHA-0 512 80 160
SHA-1 512 80 160
SHA-256 512 64 256
SHA-224 512 64 224
SHA-384 1024 80 384
SHA-512 1024 80 512
HAVAL 1024 96,128,160 128, 160, 192, 224, 256
no longer be classified as CRHFs, as collisions can be found for these algo-

rithms with less computational complexity than required by the Birthday
attack technique. The results of the collisions on MD4, MD5, RIPEMD
and HAVAL-128 are available at [41]. MD5 due to its wide usage will pri-
marily be focused upon in the remainder of this paper. MD5 is widely used
in various applications, such as, the generation and verification of digital
signatures, software integrity assurance of products such as open-source
Apache Web server and Sun Microsystems’ Solaris Fingerprint Database
and other cryptographic protocols.
The recently discovered weaknesses in MD5 make its future use in
various applications (especially in applications where digital signatures
are created) questionable and the National Institute of Standards and
Technology (NIST) plans [34] to phase out SHA-1 (weakened variant
of SHA-1 was also broken [16]) by 2010 and recommends that SHA-1
and algorithms of similar strengths (160-bit hash functions) should be
replaced with more robust algorithms such as SHA-224, SHA-256, SHA-
384 and SHA-512 available in Federal Information Processing Standard
(FIPS)180-2 approved by the NIST [8].
The attack on MD5 works on two 1024-bit message blocks. An overview
of the technique employed in finding the collisions on MD5 [41] is set out
below, but this information may change, as the complete analytical de-
tails of the attack have not yet been released publicly by Xiaoyun Wang
et al.
4
The input messages are always longer than just one block (512 bits).
Given the same input 128-bit chaining value, the idea is to find two very
similar outputs for two messages M and M 0 , such that the next blocks
“correct” the difference in the hash outputs of M and M 0 and result in
a collision. Thus, there are always two (or more) blocks of input. Hence,
given an instant equation M = M 0 ⊕ ∆1 with a fixed differential ∆1,
the technique involves finding two second block messages N i and Ni0 with
another fixed differential ∆2, whose equation is N i ⊕ Ni0 = ∆2 such that
Ni and Ni0 “correct” the difference in the intermediate chaining value es-
tablished by M and M 0 . Here, the term “differential” refers to difference
in the two input messages and the term “difference” refers to difference
of any two chaining values (for example difference between h1 and h1 0 as
shown in Fig 2). This attack technique can find collisions for any given
128-bit IV but the same inputs do not produce collisions for any IV. The
differential ∆2 does not always produce a collision. A collision will de-
pend on the starting value of the chaining variable and the contents of the
message itself. In short, the attack uses some relatively efficient method
for finding messages given a chaining value that can yield a specific dif-
ferential [38] which is not disclosed in [41]. Trying random messages does
not yield collisions effectively. This attack technique is depicted in Fig 2.
The attack also demonstrates that there are some messages that be-
long to the class maintaining differential ∆2 that are not 2 nd pre-image
resistant. This concept was not completely analyzed in [41]. For a given
instant equation M = M 0 ⊕ ∆1 and a given second block message N 1 ,
there is an efficient technique, (that is not disclosed in [41]), which ap-
pears to be better than a brute force attack and can be used to find
another message N10 such that N10 = N1 ⊕ ∆2 results in a collision. If this
is the case, it shows that MD5 is not 2nd pre-image resistant.
The attacks on MD4, RIPEMD and HAVAL-128 work on one block
messages. The technique involves finding a differential ∆ which when
applied to a certain message causes a collision; though this technique is
not revealed in [41]. This is given by an instant equation M = M 0 ⊕ ∆.
The value of ∆ is different for different hash functions but always the
same for the same hash function. These attacks establish that these hash
functions will not satisfy the collision resistance property and 2 nd pre-
image resistance required for hash functions. Whether the attacks on
MD4, RIPEMD and HAVAL-128 can be extended to more than one block
is still under investigation.
One of the possible reasons for these attacks could be that they are
all based on the design principles of MD4. The compression function of
5
M N1
h1 h
IV CF CF
PSfrag replacements
M0 N10
h10 h
IV CF CF
Fig. 2. Cryptanalysis of MD5
all the above hash functions operate as unbalanced Feistel networks in a

non-linear feedback shift register mode [39, 19].
4 Impact of collisions in MD5 on various applications.
In this Section, we focus on the impact of collisions in hash functions, par-

ticularly MD5 considering its wide usage, on some important E-commerce
applications. This Section will discuss, the affect of the attack on the digi-
tal signature applications where the application requires a CRHF, such as
MD5 for signing purposes. Other the applications will also be discussed,
where there is no immediate threat in the wake of these recent attacks.
4.1 Digital Signatures
Digital signatures are used to authenticate the signers of electronic mes-

sages. Digital signatures are substantially dependent upon the document,
that is being digitally signed, as well as the private key and the algorithms
used to affix the digital signature. If the document changes in any way
then the digital signature will also substantially change. Consequently,
a digital signature is the resultant of encrypting the hash of the docu-
ment that is being digitally signed, using a private key held by the person
causing the digital signature to be created . In general, digital signatures
6
are constructed using hash functions along with Public Key Technology
(PKT). A historical background on PKT is given in [15]. Using PKT,
the signer encrypts the “message digest” of the relevant electronic mes-
sage/document using a public key algorithm such as RSA and the private
key that only the signer of the message has access in order to create the
digital signature. Anyone, with the public key that corresponds with the
private key used to affix the digital signature, can verify the signature of
the signer. The basic premise behind PKT and its ability to provide au-
thentication and non-repudiation services is that the private key remains
private and has not been compromised by its disclosure to third parties.
Consider the following scenario between two parties Alice and Bob,
wherein Alice wishes to send a digitally signed message to Bob.
The following steps would be performed by Alice.
1. Alice hashes the message x that she wishes to send to Bob using MD5.
MD5(x) is the value of the message digest. Let MD5(x) be h.
2. Alice then encrypts h using her private key kpriv A using some public
key technology such as RSA to compute the digital signature sig A .
This is expressed as follows.
sigA (x) = ERSA (h, kprivA )
3. Alice sends message x and the signature on x, sig A , to Bob.
The following steps would be performed by Bob after receiving x and

sigA to verify the digital signature of Alice.
1. Bob hashes the message x that he received from Alice using MD5.
MD5(x) is the value of the message digest. Let MD5(x) be h 0 . Note
this is the same initial step performed by Alice above.
2. Bob decrypts the signature sigA using the public key of Alice kpubA
to get the message digest h00 .
This is expressed as follows.
h00 = DRSA (sigA , kpubA )
3. Bob compares the digests obtained in steps 1 and 2 and if they are
equal then the integrity of the message is established and provided the
private key remains private, the authenticity of the person attributed
as signing the message is also established. This is sometimes knowns
as the non-repudiation property.
Due to the Xioayun attacks it is now possible for both the signer
and the verifier of a digitally signed message that relies upon the MD5
hash algorithm to cheat each other and thus obviate the non-repudiation
7
property that has been continually argued by various researchers as being
an essential property of digital signature technology. This will be shown
in the following two scenarios.
Scenario 1: The collision attack on the MD5 algorithm described in

Section 3 can easily be used by Alice to cheat Bob. Let the innocuous
message x, that Alice wants to sign, be something like “I, Alice, am selling
my property of 5 acres to Bob for a price of $123,456.” Assume that Alice
can come up with an alternate message x 0 that gives the same digest as
x. Let x0 be “I, Alice, am selling my property of 2 acres to Bob for a
price of $223,456.” Let it also be assumed that these two messages have
the following characteristic that MD5(x) = MD5(x 0 ). Alice then sends
x and the signature sigA computed on MD5(x)( noting that it would
be same on MD5(x0 )) to Bob. Bob, believes that the message x is the
legitimate message but later Alice claims x 0 as the legitimate message by
producing the same signature sigA on x0 . Furthermore Alice has destroyed
all evidence regarding the digital signing of x.
From an evidentiary perspective this raises some serious doubts as
to the probity of the two messages and in particular the fact that the
same digital signature can be verified for both messages. At common law
the onus of proof is generally placed upon the plaintiff. That is, it is the
plaintiff who has the onus of proving his/her case.
In this case, if Alice raises a dispute over the contents of the electronic
communications that have transpired between them, she will accordingly
inform Bob of the problem. If Bob commences proceedings to prove his
case, he will have the onus of proving the value of the contract; but in
doing so he will encounter the defense that Alice has evidence that the
contract was different to that which Bob claims. The difficulty for the
arbiter of fact, namely the judge in this case, is that the judge will needs
to decide which message is the correct reflection of the contract. Other
evidence will need to be adduced by Bob, which could cause substantial
expense being placed upon Bob in proving his case. Remembering that
Bob, does not have access to Alice’s private key; so Bob can not generate
a new digital signature for x0 . Alice would argue that she does not know
how this occurred. She can also show that her public key can verify the
digital signature attached to x0 that is in her possession.
Alice will argue that x0 is the orginial message and that Bob has some-
how developed x such that MD5(x) = MD5(x 0 ). Further, she may claim
that Bob has stripped the original signature from x 0 and attached it to
x. Bob will have other evidence such as time of receipt of x that has the
8
digital signature of x, but this kind of evidence can be easily spoofed
or altered without leaving a trace by the recipient of an electronic com-
munication. Such a case could be decided solely upon the oral evidence
of the parties and not upon the technology that underpins the case. Of
course a court in this case would rightly make some substantial disparag-
ing remarks about the technology and its lack of the non-repudiation and
authentication properties.
Scenario 2: The collision attack on MD5 described in Section 3 can

easily be used by Bob to cheat Alice. Let the message x that Alice signs
be something like “I, Alice, am selling my property of 5 acres to Bob for
a price of $223,456.” Bob, the verifier of the message x, upon receiving x
and the signature affixed to it by Alice with her private key, can come up
with a similar message x0 something like “I, Alice, am selling my property
of 10 acres to Bob for a price of $123,456” producing the same message
digest as x. Bob, then strips the signature attached to message x and
attaches it to message x0 . Bob, later claims that the message sent by
Alice is x0 not x.
In this case, if Bob raises a dispute over the contents of the electronic
communications that have transpired between them, he will accordingly
inform Alice of the problem. If Alice commences proceedings to prove her
case, she will have the onus of proving the value of the contract; but in
doing so she will encounter the defense that Bob has evidence that the
contract was different to that which Alice claims. Further, Bob does not
have access the private key of Alice as it is known only to her and hence
the signature on x0 could only have been signed by Alice. That is, since
Bob is able to verify the digital signature affixed to x 0 using Alice’s public
key and Bob has no access to Alice’s private key then Bob will argue that
Alice is trying to defraud him of the contract value.
Bob will force Alice to admit that her private key has not been com-
promised, which means that theoretically she is the only person who had
access to the relevant private key and was the only person capable of
activating its use to digitally sign x 0 . Since Bob had no access to Alice’s
private key, and it is Alice’s public key that is used to verify the digi-
tal signature affixed to the electronic communication that is being relied
upon by Bob, it will be very difficult for Alice to discredit this evidence
in the court proceedings, even though the original message x has been
substituted by Bob for x0 .
The collision of y for both x and x’ is a substantial flaw in digital sig-
nature technology, where MD5 is being used as the hash algorithm. Such
9
a flaw will completely undermine the concept of non-repudiation regard-
ing forged digital signatures. The concept of non-repudiation has been
a principal attribute promoted by a number digital signature technology
providers.
In scenario 2 a Court, at a minimum would come to the conclusion
that there is some uncertainty regarding the validity of the two messages x
and x0 and at a maximum the electronic communication in the possession
of Alice that has Alice’s digital signature affixed, has been altered by
Alice, which could give rise to an allegation of giving false evidence to
tampering with evidence.
It can be seen that these attacks on MD5 greatly undermine the
evidential value of digital signature technology where MD5 or any of
the other mentioned hash algorithms are used for digital signature pur-
poses. The collision attack on MD5 discussed in Section 3 can be used
to construct ASCII message sequences N i and Ni0 for the given equation
M = M 0 ⊕ ∆1, which can result in the same hash value.
There are no legal cases where digital signatures have been specifically
disputed, though it is generally accepted that a digital signature will not
be non-repudiatable because there are many legal reasons where a party
may be able to successfully repudiate a digital signature attributed to
them, such as unconscionable conduct or undue influence or duress. [29]
What has been generally taken as the base position, is that digital sig-
nature technology will greatly reduce the incidence of forgeries, but since
the successful attacks by Xiaoyun et al even the issue of forgeries has now
become an issue, which undermines the concept of non-repudiation even
further. When the underlying hash function technology is weak, it could
result in the compromise of the non-repudiation security property.
A further effect of the undermining of the non-repudiation property
is the long term archiving of digitally signed documents. It is not unusual
for a dispute to take a substantial amount of time to elapse before it
will be heard by a judge. During this time both parties have to ensure
that the evidence they have in thier possession does not become tainted
and maintains its integrity. In Australia section 11(3) of the Electronic
Transactions Act 1999 (Cth)[ETA] provides that an electronic document
can be endorsed by a third party for the purposes of integrity. If the PKT
used to affix a digital signature is undermined due to some technological
advancement, it is not correct to get the parties to resign the document
as this would alter the document in a substantial manner. It may also
be impractical for this to ocur as one of the parties may not be available
or even died in the mean time. The better approach is to get a trusted
10
third party to endorse the document by affixing another digital signature
which uses a more robust PKT. This can be undertaken multiple times.
The orginal document with its original digital signature or digital signa-
tures is maintained and thus the integrity of the document is preserved.
The trusted third party will need to be noted as an endorer so as not
to be confused as an orgiginal signatory. When the case is finaly heard
by a Judge, each endorsement digital signature will be verfied until such
time as the orginal signatures can be verified. For long term archiving
pursoses this appraoch is recommended. The Uniform Electronic Trans-
actions Act that has been adopted by a substantial number of US states
does not specifically recognise the endorsement mechanism as provided in
the Australian Electronic Transactions Act but EUTA does make provi-
sion for security procedures which are to be used to maintain the integrity
of digitally signed documents and therefore the procedure would be sim-
ilar as noted above.
The issue of obviating the non-repudiation property has an even more
catastrophic affect when digital certificates are the documents that are be-
ing digitally signed. One of the difficulties in using PKT is the deployment
of the corresponding public keys that are used to verify digitally signed
documents. Remembering that it is the public key that corresponds to
the private key that is used to verify the digital signature. To distribute
public keys, digital certificates were created, which are currently based
upon the ISO standard X.509 v. 3. [22]
Digital Certificates A X. 509 v. 3 certificate is itself a digital docu-

ment that will have embodied in it an entities public key as well as other
information such as the hash algorithm and the public key algorithm that
was used to digitally sign the document. A X. 509 v. 3 certificate will be
digitally signed by the issuing authority which is generally known as a
certification authority (CA) or trusted third party (TTP).
An entity in need of a digital certificate submits to the CA their iden-
tification details that evidences their identity, the public key and other
required information. The CA issues the certificate once the identity of
the entity is sufficiently confirmed with the details provided. The certifi-
cate, in general, contains certificate version number, a unique certificate
serial number, signature algorithms used by the holder of the correspond-
ing private key such as RSA with MD5, or ElGamal with SHA-1, the
issuer name (that is the name of the CA that issues the certificate), cer-
tificate validity period, subject of the certificate (the entity for which
the certificate is issued), public key of the subject, some extension fields
11
and the signature of the CA. Digital certificates are well discussed in the
book [23].
Due to the attacks on MD5, it is now possible for a third party to alter
the contents of a certificate in a restricted sense to some other information
without altering the digital signature of the CA, that is attached to the
certificate. The restricted sense is that the attacker will need to identify
an x0 that corresponds to x where h(x) = h(x 0 ). It may be that no such
x0 can be identified for all certificates but it may exist for a subset of
all possible certificates. It will not be possible for the attacker to resign
the certificate as this would require the attacker to have access to the
private key of the CA. A fraudulent entity might be able to come up with
a duplicate certificate that had a corresponding hash collision with the
legitimate certificate that the entity gets from the CA and then transfers
the CA’s signature on the true certificate on to a fake certificate.
This would have catastrophic implications for identity management as
this attack could result in an increased incidence of identity fraud in non-
face-to-face transactions where a merchant or customer is relying upon
the X.509 v.3 certificate as the basis of identifying the other person to the
transaction. From an evidential position this could result in an innocent
party being held liable for a transaction to which they did not in reality
participate.
An attack scenario could be as follows. When a customer’s web browser
connects to the server operated by a bank that has internet banking op-
erations, the bank’s server sends the digital certificate signed by the CA
that issued the certificate. The certificate will contain the bank’s public
key, which the customer’s machine can use to establish the secure sessions
with the bank’s server. A malicious attacker can generate two certificate
requests as follows. “Digital certificate request for www.centralpark.com
and here are my personal details” and “Digital certificate request for
www.centralbank.com and here are my personal details” that contain the
same public key and producing the same MD5 message digest. But the at-
tacker sends the initial request to get the digital certificate signed by the
CA and later inserts the signature on to the fake second message thereby
making a perfect forgery if the signature scheme used is malleable. Any
browser can easily trust the www.centralbank.com’s digital certificate as
the genuine certificate and masquerades as a bank thus sneaking the per-
sonal details of a customer. This attack is similar to the one described
in Section 4.1 where the non-repudiation property of security is violated.
This type of attack could be used to great effect in the incidence of the
email phishing scam that has become so prevalent in recent times.
12
The practical possibility of this attack is still in doubt. The reason is
that when the CA signs a certificate, CA specifies a unique serial number
in the certificate. It depends on how much control the attacker has on
the serial number field. However, it is recommended that CAs move away
from using MD5 for signing purposes while issuing digital certificates. The
attacks on hash functions discussed in Section 3 cannot be used to tamper
the existing certificates, for example Secure Socket Layer (SSL) webserver
certificates, as it needs an attack on the pre-image resistance property of
MD5 [7] and so far the best known attack to violate this property is the
brute-force attack which requires 2 128 practical computations of the MD5
algorithm and hence infeasible.
Internet security protocols such as SSL [18] and Internet Protocol
Security (IPSec) could be affected by the attack on MD5 described in
Section 3 if their digital certificates use the MD5 algorithm. But these
protocols are designed in such a way that MD5 can be replaced with the
SHA-1 configuration, if it is essential [6]. MD5 is also used in the actual
protocols like SSL version 3, Transport Layer Security(TLS) [14] and
IPSec along with HMAC for the key establishment and other purposes.
This combination would not affect the security of any of these protocols
due to the reasons given in Section 4.4.
4.2 Password Verification Schemes

Users of computer systems have to evidence that they are who they claim
to be. For example, a user accessing a database application, supplies the
username and password for authentication. The MD5 algorithm is widely
used to hash the password and the message digest of the password is
stored in a database which achieves application level security as shown in
Fig 3.
An attacker who gains access to the password message digest database
cannot see the username and password combinations but can see the
username and hash of the password. In some systems, a random string
is attached to the password upon login to the system to make dictionary
attacks less effective [31] and the random string and the hash of the
password are stored in the database. The correct password of the user
can be generated from the known hash value only if an attacker is able
to invert the hash function - MD5. If an attacker is able to find any other
input password that hashes to the known or stored hash value then he/she
will be authenticated to the system. In this particular application, it is
not concerned finding any collision, but focuses on finding collisions for
the known message digest of the password. It does not matter whether an
13
attacker can get the exact password used by the user for authentication
as part of the collision. As long as attacker finds some input that hashes
to the known digest that is enough for him/her to be authenticated to
the system.
The feasibility of the attacks described in Section 3 in finding a valid
password to the known message digest is negligible. Historically, design
and analysis of hash functions is done more as an art than as science. To
the best of the authors’ understanding, the attacks on the four 128-bit
hash functions presented in [41] are more an art rather than science. The
technique of crafting the input message blocks with particular differentials
does not work in the case of password schemes because this technique
requires finding N1 and N10 with a differential ∆2 for the known equation
M = M 0 ⊕ ∆1. This means that an attacker has to initially find messages
to get a collision, which requires the attacker to invert the MD5 algorithm
for the known message digest, that he/she gets from the password file. In
this case, it is the pre-image resistance property of the MD5 algorithm
that has to be violated to compromise the password validation scheme
which requires 2128 practical computations and so far there are no known
practical attacks on the MD5 algorithm that can be used to invert it.
Once a password input for the known or stored digest is known, there is
no point in finding collisions to the MD5 algorithm. Hence the attacks
described in Section 3 do not apply to the password schemes.
It is also not possible for an attacker to eavesdrop on passwords in
transit because eavesdropping attacks are prevented by encrypting the
communication between the client and the server by using secure proto-
cols such as SSL [35]. The protocols such as SSL and TLS that provide
secure communications, use MD5 in the form of HMAC. The attack on
MD5 described in Section 3 is not applicable on HMAC-MD5 due to the
reasons discussed in Section 4.4. But it is recommended to use more ro-
bust hash functions (and encryption schemes) than MD5, for example, in
the unix systems (Modern unix and linux systems use the MD5 algorithm
with a 256-bit character limitation) due to the weakness in their password
schemes and in light of new technological advances in the computer sys-
tems [25].
4.3 Hierarchical PKI

Here we show that the collisions in MD5 do not effect the flow of the
infrastructure of a hierarchical PKI. It will not be possible for an attacker
to impersonate a CA within a PKI in order to issue false certificates. The
hierarchical PKI model looks as shown in Figure 4. In this model, the
14
M
MD5(PASSWORD)
M MD5(PASSWORD)
PASSWORD, M yes
MD5 = ACCEPT
PASSWORD MD5(PASSWORD)
no
REJECT
Fig. 3. MD5 for password checking
trust relationship among the CAs is established using a single digital

certificate [23]. A superior CA issues certificates to the subordinate CAs.
Any compromise of a CA in any particular hierarchical path would result
in the lose of services for that particular section only and would not affect
any other part that is not dependent upon the affected section within the
heirachical path. In order to compromise the entire hierarchy the attacker
would need to succeed against the root certificate.
A malicious attacker can create a new certificate C2 by changing the
public key of any intermediate digital certificate of a subordinate CA (say
CA1) keeping other entries of the certificate fixed still producing the same
message digest as C1. This could be possible if the new certificate C2 is
chosen in such a way that specific differential value of ∆2 corresponding to
the differences in the public key entries of the two certificates C1 and C2
is maintained. Then the attacker can splice the signature of the superior
CA on C1 onto C2. By doing so, the attacker can masquerade as an
intermediate CA. Obviously by impersonating as a true CA, the attacker
cannot issue certificates as he cannot derive the corresponding private key
from the public key used in his certificate.
Consequently, it appears that the collision attacks on the hash func-
tions have no adverse effect on the working of PKI.
4.4 HMAC Applications

Keyed hash functions, also called Message Authentication Codes (MACs)
are keyed hash primitives that are used to protect the integrity of the
15
R
CA1 CA2
CA3 CA4
Fig. 4. Hierarchical PKI
message since the time the message was created, transmitted or stored by
an authorized source over an unreliable medium [31]. MACs are also called
cryptographic checksums or integrity check values. The sender computes
the MAC value of the message using a MAC algorithm which utilises
a secret key and the message as input parameters. This MAC value is
appended to the message and is sent to the receiver as shown in Fig 5.
The receiver upon receiving the message with the appended MAC value,
separates the message from the MAC value and computes the MAC using
the same shared MAC key on the message and compares the received
MAC value with the sent MAC value. A match between these two values
guarantees the authenticity and the integrity of the message.
secret key
MAC
message
ALGORITHM
message MAC unsecured channel
Fig. 5. Message Authentication Code
Traditionally block ciphers are used in Cipher Block Chaining (CBC)

mode to construct MACs for authentication purposes. Such kind of MAC
algorithms are called CBC-MAC algorithms. The security analysis of
16
CBC-MACs was presented in [12, 37]. MACs can also be constructed using
unkeyed cryptographic hash functions such as MD5 and SHA-1. Bellare
et al have shown in [10, 11] such a design scheme called HMAC with a
deeper security analysis. The generalization of HMAC is specified in [26,
9].
The collisions on the MD5 algorithm described in Section 3 do not
apply to its usage as HMAC as the properties required from MD5 are
different in the HMAC context [27]. The attack on MD5 described in
Section 3 works for specific value of differential which when applied to
certain messages causes a collision. The trick here is that it does not al-
ways cause a collision; whether it does or not depends on the initialisation
vector (IV) and the contents of the message itself. These kind of collisions
based on known IVs in the underlying hash function such as MD5 are not
relevant to the security of HMAC. The collisions on HMACs based on
MD5 would be relevant only for a variable and secret IV [27]. So the
analysis required for the hash functions used in HMACs is completely
different from the presented work in [41].
Transport Layer Security(TLS) protocol which is defined as a pro-
posed Internet standard version for SSL version 3 in [14] uses HMAC
algorithm defined in [26] as a MAC function and also as a Pseudo Ran-
dom Function (PRF). The HMAC used in TLS protocol employs either
MD5 or SHA-1 as the underlying hash function. The MAC function de-
fined in SSL version 3 protocol is similar to the HMAC algorithm with a
minor difference [40]. None of these HMAC applications are affected by
the usage of MD5 in HMAC due to the recent attacks on MD5.
4.5 Software Protection

Software vendors want to protect the integrity of the software they sell
to the users [36]. Hash functions are used to check the integrity of the
software that the user receives from the vendor and make sure that user
would not receive virus or malicious programs that infect his/her PC.
A user can get the software or any application from the vendor through
the CDs/DVDs, as downloads from the vendor’s website or from vendor’s
mirror website or through some other means. Sometimes users may also
download the software or files for free from the internet and expect them
to be free from viruses or malicious content.
There are so many free tools available on the Internet today [5] that
use MD5 algorithm for data integrity control and verification of network
file transfer, E-mail messages and files or software downloaded from the
Internet. The MD5 algorithm is used to check the integrity of a Cisco
17
Internetworking Operating System(IOS) software image [2]. The MD5
file validation feature of the Cisco IOS uses MD5 to create a 128-bit
checksum of the Cisco IOS software image on some of the Cisco released
products and compares that with the MD5 checksum of the images on
those releases posted on the Cisco website. Apache webserver [3], the most
popular webserver on the Internet, develops and maintains an open-source
Hyper Text Transfer Protocol(HTTP)server for the Unix and Windows
NT operating systems. It uses MD5 hashes as one of the options, the
other being the Pretty Good Privacy (PGP) signatures, to ensure the
verifiability of the downloads from its home page and other mirror sites [4].
It uses appropriate MD5 embedded programs on the unix and windows
distributions to achieve this. The Solaris Fingerprint Database(sfpDB),
a free Sun Microsystems security tool, uses MD5 to verify the integrity
of the files distributed with the Solaris Operating Environment [1]. The
sfpDB ensures that its official binary distributions contain authentic files
but not adapted ones that compromise the system security. The sfpDB
uses MD5 to compare the digest of the binary distributions with the
trusted hashes stored on its homepage and hence identify any mismatches
if present.
In the wake of attack on MD5 described in Section 3, there may be
more immediate threat to the above mentioned applications where MD5
is used as a CRHF to achieve data integrity [30]. Since the data input to
these applications is at the control of the attacker, it is possible for the
attacker to create identical MD5 checksums using the attack technique on
MD5 for the true software content and the malicious, for example virus,
content and replace the genuine code with the malicious code. Hence,
when the hash function used for the integrity checking is not robust, the
end user cannot identify the virus infected code from the true code.
5 Need for new Hash Function Designs
The cryptographic hash functions listed in Table 1 have their compression

functions operating like Unbalanced Feistel Networks (UFNs) in the non-
linear feedback shift register (NLFSR) mode of operation [39, 19]. These
designs based on MD-x family, operate in the style of block cipher in feed-
forward mode. SHA-0, the main alternative to the MD5 algorithm, was
issued as Federal Information Processing Standard (FIPS-180) by NIST in
1993 [17]. It was superseded by its revised version SHA-1 [28](FIPS-180-
1) which differs from SHA-0 only in the circular shift operation after the
linear message expansion of the input block to address the unpublished
18
weaknesses found in SHA-0 by the National Security Agency (NSA). Re-
cently it was shown in [16], two near collisions for the full compression
function, many full collisions for the 65-round version of SHA-0 and col-
lisions for the 34-round version of SHA-1. Near collisions for 45-round
version of SHA-1 were also shown in [16]. Antoine Joux has shown that
finding multiple collisions(more than just two messages hashing to the
same digest) in iterated hash functions is not much harder than find-
ing ordinary collisions [24]. The security analysis of hash functions like
SHA-256 and SHA-512 has already started [20, 21].
These advances in the cryptanalysis of hash functions of MD-x family
show that having a unique style of approach in their design might be a
single point of failure for cryptographic hash functions. It was recently
recommended in [19] to seek alternative design paradigms for secure and
efficient cryptographic hashing due to the increasing interest in all forms
of cryptanalysis on hash functions.
6 Conclusion
Cryptanalysis of MD5 and other hash functions presented in [41] highly

affect security in applications where the attacker has a control over the
input messages that are being hashed. Considering the attacks on vari-
ous 128-bit hash functions and the NIST’s recent announcement [34] on
the phase out of the stronger hash function SHA-1 (due to an attack on
its reduced version) by 2010, it is highly recommended not to use either
current 128-bit or 160-bit cryptographic hash functions of MD-x fam-
ily in the future applications as CRHFs. It is further recommended that
new approaches in the design of hash functions be developed by impos-
ing some stringent restrictions in the fundamental security properties of
these designs to thwart present and new kinds of cryptanalysis. Given
this context, it would be appropriate to have a worldwide competition
for a new hash function standard like the NIST’s Advanced Encryption
Standard process (AES) [33] for the new block cipher standard.
As can be seen from the above, the current commercial use of the
MD-x family of hash algorithms has been substantially undermined as an
effective trusted mechanism for digital signature implementations. Even
though, digital signature technology has not become as pervasive as was
once expected, its use is still substantial in certificate distribution mech-
anism and as such these attacks do call into question the trust value that
has been attributed to digital signature technology when MD5 has been
used. The authors have made a intermediate investigation of a number of
19
websites and have identified that within the websites investigated a sub-
stantial number of these websites using a SSL session were also relying
upon a digital certificate that used the MD5 hash algorithm for signature
purposes, which must cause some concern.
References
1. The Solaris Fingerprint Database- A Security Tool for Solaris Operating Environ-
ment Files. This pdf document is published at the Sun Microsystem’s BluePrints
section, May 2001. http://www.sun.com/blueprints/0501/Fingerprint.pdf Last ac-
cess date:17 September,2004.
2. MD5 File Validation. A document on MD5 File Validaion available on Cisco Web-
site, 2002. http://www.cisco.com/univercd/cc/td/doc/product/software/ios122/
Last access date:17 September,2004.
3. Apache HTTP Server Project. This is the homepage of Apache HTTP server,
2004. http://httpd.apache.org/ Last access date:17 September,2004.
4. Apache HTTP Server Source Code Distributions. This download page of source
code is on the Apache website, 2004. http://www.apache.org/dist/httpd/ Last
access date:17 September,2004.
5. Data Invariability and Integrity Control. Internet article on a tool which MD5 for
integrity purposes, 2004. http://www.fastsum.com/ Last access date:16 Septem-
ber, 2004.
6. FAQ: MD5, SHA-0 and hash collisions. Certicom’s Frequently asked Questions on
Recent Attacks, 2004. http://www.certicom.com/index.php Last access date:13
September, 2004.
7. Hash collision question and answer. Crypto News on Attacks on Hash Func-
tions, 2004. http://www.cryptography.com/cnews/hash.html/ Last access date:13
September, 2004.
8. National Institute of Standards and Technology (NIST) , Computer Systems Lab-
oratory. Secure Hash Standard. Federal Information Processing Standards Publi-
cation (FIPS PUB) 180-2, August 2002.
9. American Bankers Association. Keyed Hash Message Authentication Code,ANSI
X9.71, Washington, D.C., 2000.
10. M. Bellare, R. Canetti, and H. Krawczyk. Keying hash functions for message
authentication. Lecture Notes in Computer Science, 1109:1–15, 1996.
11. Mihir Bellare, Ran Canetti, and Hugo Krawczyk. Message authentication using
hash functions: the HMAC construction. CryptoBytes, 2(1):12–15, Spring 1996.
12. Mihir Bellare, Joe Kilian, and Phillip Rogaway. The security of cipher block chain-
ing. Lecture Notes in Computer Science, 839:341–358, 1994.
13. I.B. Damgard. A design principle for hash functions. Lecture Notes in Computer
Science, 435:416–427, 1990.
14. Tim Dierks and Christopher Allen. The TLS protocol version 1.0. Internet Request
for Comment RFC 2246, Internet Engineering Task Force, January 1999. Proposed
Standard.
15. Whitfield Diffie. The first ten years of public-key cryptography. Proceedings of the
IEEE, 76:560–576, 1988.
16. Rafi Chen Eli Biham. Near-collisions of sha-0. Cryptology ePrint Archive, Report
2004/146, 2004. http://eprint.iacr.org/.
20
17. FIPS (Federal Information Processing Standards Publication). Secure Hash Stan-
dard: FIPS PUB 180. United States Government Printing Office, Washington,
DC, USA, May 11 1993.
18. Alan O. Freier, Philip Kariton, and Paul C. Kocher. The SSL protocol: Version
3.0. Internet draft, Netscape Communications, March 1996.
19. Praveen Gauravaram, William Millan, and Lauren May. CRUSH: A New Cryp-
tographic Hash Function using Iterated Halving Technique. In Proceedings of the
workshop on Cryptographic Algorithms and their uses, pages 28–39, Goldcoast,
Australia, July 4–5 2004.
20. Henri Gilbert and Helena Handschuh. Security Analysis of SHA-256 and Sisters.
Lecture Notes in Computer Science, 3006:175–193, 2004.
21. Philip Hawkes, Michael Paddon, and Gregory G. Rose. On corrective patterns
for the SHA-2 family. Cryptology ePrint Archive, Report 2004/207, 2004. http:
//eprint.iacr.org/.
22. R. Housley, W. Ford, W. Polk, and D. Solo. RFC 2459: Internet X.509 public
key infrastructure certificate and CRL profile, January 1999. Status: PROPOSED
STANDARD.
23. Russ Housley and Tim Polk. Planning for PKI. John Wiley and Sons, Inc., 2001.
24. Antoine Joux. Multicollisions in iterated hash functions. application to cascaded
constructions. In Matt Franklin, editor, Advances in Cryptology-CRYPTO 2004,
pages 306–316, Santa Barbara, California, USA, August 15–19 2004. Springer.
25. Gershon Kedem and Yuriko Ishihara. Brute force attack on UNIX passwords
with SIMD computer. In Proceedings of the 8th USENIX Security Symposium
(SECURITY-99), pages 93–98, Berkely, CA, August 23–26 1999. Usenix Associa-
tion.
26. H. Krawczyk, M. Bellare, and R. Canetti. RFC 2104: HMAC: Keyed-hashing for
message authentication, 1997. Status: INFORMATIONAL.
27. Hugo Krawczyk. HMAC with MD5 and SHA-1, 22 August 2004. This Online
Posting of Crypto Forum Research Group is available at http://www1.ietf.org/
mail-archive/web/cfrg/current/msg00527.html.
28. National Institute of Standards and Technology (NIST) Computer Systems Labo-
ratory. Secure hash standard. Federal Information Processing Standards Publica-
tion (FIPS PUB) 180-1, April 1995.
29. Adrian McCullagh and William Caelli. Non-Repudiation in the Digital En-
vironment. First Monday–Peer-Reviewed Journal On the Internet, 5, August
2000. This article is available at http://www.firstmonday.dk/issues/issue5_
8/mccullagh/.
30. Declan McCullagh. Crypto Researchers Abuzz over Flaws. This report was pub-
lished at the CNET News, 2004. http://news.com.com/Crypto5313655.html/ Last
access date:17 September,2004.
31. Alfred J. Menezes, Paul C. Van Oorschot, and Scott A. Vanstone. Handbook of
Applied Cryptography, chapter Hash Functions and Data Integrity, pages 321–383.
The CRC Press series on discrete mathematics and its applications. CRC Press,
1997.
32. Ralph C. Merkle. One way hash functions and DES. Lecture Notes in Computer
Science, 435:428–446, 1990.
33. National Institute of Standards and Technology (NIST) . Advanced Encryption
Standard. The details of AES process can be found at http://csrc.nist.gov/
CryptoToolkit/aes/.
21
34. National Institute of Standards and Technology. NIST Brief Comments on Recent
Cryptanalytic Attacks on Secure Hashing Functions and the Continued Security
Provided by SHA-1, August 2004. This short notice by NIST is available at http:
//csrc.nist.gov/CryptoToolkit/tkhash.html.
35. Benny Pinkas and Tomas Sander. Securing passwords against dictionary attacks.
In Vijay Atlury, editor, Proceedings of the 9th ACM Conference on Computer and
Communication Security (CCS-02), pages 161–170, New York, November 18–22
2002. ACM Press.
36. Bart Preneel. Analysis and design of Cryptographic Hash Functions. PhD thesis,
Katholieke Universiteit Leuven, 1993.
37. Bart Preneel and Paul C. van Oorschot. MDx-MAC and building fast MACs from
hash functions. Lecture Notes in Computer Science, 963:1–14, 1995.
38. Greg Rose. Personal Communication, August 2004.
39. B. Schneier and J. Kelsey. Unbalanced Feistel networks and block cipher design.
Lecture Notes in Computer Science, 1039:121–144, 1996.
40. William Stallings. Cryptography and Network Security Principles and Practices,
chapter Web Security, pages 527–562. Prentice-Hall, Inc., Third edition, 2003.
41. Xiaoyun Wang, Dengguo Feng, Xuejia Lai, and Hongbo Yu. Collisions for Hash
Functions MD4, MD5, HAVAL-128 and RIPEMD. Cryptology ePrint Archive,
Report 2004/199, 2004. http://eprint.iacr.org/.
42. Gideon Yuval. How to swindle Rabin. Cryptologia, 3(3):187–189, July 1979.
22

Collision With Hashes

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Collision With Hashes

Încărcat de

Drepturi de autor:

Formate disponibile

The Legal and Practical Implications of the

Recent Attacks on 128-bit Cryptographic Hash

Praveen Gauravaram, Adrian McCullagh and Ed Dawson

Cryptographic hash functions serve an essential role within a wide range

– pre-image resistance: A hash function h is said to be pre-image re-

A hash algorithm that satisfies the first two properties mentioned

2 Operation of Hash Functions

COMPRESSION COMPRESSION COMPRESSION

Fig. 1. Merkle/Damagard Iterative Structure for Hash Functions

3 Details of the Attacks

The attacks on MD4, MD5, RIPEMD and HAVAL-128 presented at

Hash Function Block size(bits) CF Iterations Hash Value(bit-length)

no longer be classified as CRHFs, as collisions can be found for these algo-

Fig. 2. Cryptanalysis of MD5

all the above hash functions operate as unbalanced Feistel networks in a

4 Impact of collisions in MD5 on various applications.

In this Section, we focus on the impact of collisions in hash functions, par-

4.1 Digital Signatures

Digital signatures are used to authenticate the signers of electronic mes-

The following steps would be performed by Bob after receiving x and

Scenario 1: The collision attack on the MD5 algorithm described in

Scenario 2: The collision attack on MD5 described in Section 3 can

Digital Certificates A X. 509 v. 3 certificate is itself a digital docu-

4.2 Password Verification Schemes

4.3 Hierarchical PKI

Fig. 3. MD5 for password checking

trust relationship among the CAs is established using a single digital

4.4 HMAC Applications

Fig. 4. Hierarchical PKI

message MAC unsecured channel

Fig. 5. Message Authentication Code

Traditionally block ciphers are used in Cipher Block Chaining (CBC)

4.5 Software Protection

5 Need for new Hash Function Designs

The cryptographic hash functions listed in Table 1 have their compression

Cryptanalysis of MD5 and other hash functions presented in [41] highly

S-ar putea să vă placă și