What is a Collision Attack?

Collision attacks are a significant concern in the realm of cryptography. In certain circumstances, an attacker can use them to undermine the security granted by digital signatures, allowing them to fraudulently make data appear as if it has been verified for its integrity and authenticity. This means that collision attacks can circumvent the security mechanisms we rely on to keep our online world safe.

But before you can truly understand what a collision attack is, you need to first wrap your head around what a collision actually is. To do that, we need to take one step further back, and bring you up to speed on hashing.

What is hashing?

A cryptographic hash function is an algorithm that has a number of specific properties which turn out to be incredibly useful in the world of cryptography. They take an input, often called the message, which is then run through the hash function, resulting in the output, a hash, which is also sometimes called a message digest.

There are also hash functions that aren’t cryptographic, which are simpler algorithms that are often used in data storage and retrieval. However, these are tangential to collision attacks, so we won’t bother discussing them in depth.

The properties of cryptographic hash functions include:

Hash functions are deterministic — When the same input is run through a given hash function, it always results in the exact same output, every single time.
Hash functions take data inputs of arbitrary size and always output a fixed-length string — A hash function can take on data of any size, from a single character to an encyclopedia full of words. No matter how long the input, the output of a given hash function is always the same length.
Hash functions are one way — It’s easy to take an input, run it through the hash function, and then find out what the matching hash is for that input. However, it’s unfeasible to take a hash and then figure out what the original input was from the hash. By ‘one-way’, we mean that it’s only practical to compute a hash function in one direction, not the other.
Hash functions can be computed rapidly — For hash functions to be practical, we need them to be able to quickly turn a message input into a hash. They would be far less useful if the process took too long and required a substantial amount of computational resources.
Minor changes to the input result in significant changes to the output — Even if two inputs are identical except for a single character, a cryptographic hash function will deliver hashes that appear to have almost nothing in common.
It’s not feasible to find two different inputs that result in the same output — In order to ensure security in the applications of hash functions, it must not be feasible for an attacker to find two different inputs that produce the same hash. When two inputs with the same hash are found, it’s called a collision.

So, cryptographic hash functions have all of these magical properties, but what do they actually look like?

We’ll give you a quick demonstration with an online hash calculator. The webtool we have linked takes inputs, runs them through SHA-256 which is the gold standard hash function, and then delivers you a 256-bit hash.

You can try it with any input you want, but we will run through a few just to demonstrate some of the key properties. When we run “What is hashing?” through the calculator, it gives us a hash of:

2da0ed1070f7e7306a23785422576c864bdec63a6032482badc26fc5102f9a9c

The hash is in hexadecimal, which is simply a different way to count. Instead of using the system we are used to, which has 10 different numbers (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), hexadecimal has 16 (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f). In a base-16 system, 0-9 is used for the values 0 to 9, and A-F (or a-f) for values 10 to 15.

The hash is 64 characters long, and in hexadecimal notation, each character is four bits. For example, the first “2” in the hash is 0010, the “d” is 1101 etc. This means that the hash is 256 bits long.

Let’s try a different input, just the single character, “a”. When we calculate the hash, we get:

ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb

Notice that even though the inputs had significant size differences, the outputs are both the very same length, 64 characters or 256 bits. As we said, one of the properties of a cryptographic hash function is that they take inputs of any length, and always deliver a fixed length output. You could try inputting a whole book into SHA-256, and you would still end up with a 256-bit hash. Note that this won’t actually work in the calculator we have linked, because this particular web tool has a limit on the amount of characters that it will accept as its input. However, this is due to how SHA-256 has been implemented on the website, and not a limitation within SHA-256 itself.

Now let’s demonstrate how even a slight change in the input results in a massively different hash. Let’s take our earlier input, “What is hashing?”, but change the question mark to an exclamation mark, “What is hashing!” It’s a relatively insignificant change, right? Well, let’s see what we get when we throw it through the hash function:

02a3603765eb85b8aa96aa8c18df75675a0670b33eb2c306e54d13a2baabffa5

If you compare this hash with the first one, you will see that the two have basically nothing in common. This is an important part of cryptographic hash functions, and this property is known as the avalanche effect.

If you’ve tried the calculator yourself, you will have seen that these hashes are calculated rapidly (unless you have slow internet). You can also try the same input over and over again, and you will see that the hash function is also deterministic—you will always get the same result for a given input.

It’s a little harder to demonstrate to you that SHA-256 is a one-way function and that it’s almost impossible to figure out the original input if all you have is the hash. It’s also challenging to show you that it’s unfeasible to find two different inputs that result in the same hash. You’ll just have to trust us on these properties, because they are central to the design process of secure cryptographic hash functions.

The uses of hashes

By now, you probably have some idea of what hash functions look like, and what they can do, but what’s the point of turning an input into a jumble of numbers and letters?

Well, it turns out that all of those properties are actually pretty useful, and hashes are implemented in a range of larger systems. In order to adequately understand the dangers of collision attacks, it’s important to understand where cryptographic hashing is actually used.

Digital signatures

One common application is in digital signatures. If the sender of a message wants to prove that it really came from them and not an impostor, and also that the message has not been changed after it was sent, they can do so with a digital signature. These involve certificates and public-key algorithms like RSA in addition to hashing.

The short version of digital signatures is that authenticity and integrity can be verified if the sender runs their message through a cryptographic hash function, and then runs it through a computation alongside their private key. The sender then sends this result as the digital signature alongside their message to the recipient.

When the recipient gets the message, they can verify its integrity and authenticity by running the message through a hash function to get a hash of the message. They then take the digital signature they received from the sender and perform some calculations on it with the sender’s public key.

The result can then be compared to the message hash they just calculated. If the message the recipient received has retained its integrity and authenticity, then these two values should be the same.

Password hashing

Password keyword codeword by Geralt licensed under Pixabay license.

Another example is in password hashing. If a company stores your password as plaintext, you are kind of screwed if a hacker breaks in and steals their password database. The hacker can just look at the database, and then use the plaintext passwords to log in to any account they want, including yours.

In 2023, researchers found a publicly available data base with 50,000 plaintext passwords for the French web-based games “Play Glory” and “Play Astra.” In 2019, Facebook admitted that millions of its customers passwords were stored in plaintext on internal servers – and thus readable by the company’s employees.

There are plenty of other examples – some fairly inexcusable — such as the WordPress security plugin that stored plaintext passwords on a database accessible to website admins.

It should go without saying that storing passwords as plaintext is terrible for security, because we can’t guarantee that we can keep password databases out of the hands of hackers.

Thankfully, cryptographic hash functions give us a much better solution. They allow developers to verify passwords without ever having to store the password as plaintext in a database. As we mentioned, it’s basically impossible to find two different inputs that result in the same hash, and it’s also unfeasible to take a hash and then figure out what the initial input was.

This means that when you set up an account and type in your user name and password, the developers can hash your password straight away, and only store the hash, never the plaintext of your password. Whenever you go to log in, they simply hash your password as soon as you enter it, and then compare it against the hash that they have stored in the database.

If the two match, the developers assume that you entered the same password both times, and that you are therefore the same user. So they grant you access to your account, all without ever knowing what your password actually is.

If a hacker steals a database of password hashes, it’s much harder for them to abuse them than if they had stolen the passwords in plaintext. Due to the nature of secure cryptographic hash functions and the fact that it’s so unlikely for two different inputs to result in the same hash, we don’t have to worry too much about an attacker abusing the system and entering an incorrect password that still results in the matching hash.

Other uses of cryptographic hashes

Cryptographic hashes have a range of other uses. These include:

Message authentication codes (MACs)
Blockchain proof-of-work systems
Data fingerprinting
Checksums that detect data corruption

What is a collision?

By now, we know that hashes are actually useful. We also briefly mentioned in the intro that collisions are a threat in certain scenarios where cryptographic hash functions are implemented. But what actually is a collision?

Remember how we said that it should be unfeasible for two different inputs in a cryptographic hash function to result in the same hash?

Well… a collision is when things don’t pan out as planned, and two separate inputs actually do result in the same hash value. As we have said, this isn’t supposed to happen, and it can be incredibly problematic if it does. When collisions are practical, it allows attackers to completely undermine the integrity and authentication aspects of digital signatures. This can lead to all kinds of problems, including fraud and allowing attackers to worm their way past existing security systems.

We’ll let you in on a little secret: It’s actually impossible for a hash function to be designed to completely avoid collisions.

This is due to the pigeonhole principle. In the context of hashing, it basically states that if there are more inputs than there are possible hashes, then some inputs must result in the same hashes.

As we have stated, the hashes in SHA-256 are 256-bits long. This means that there is a maximum size that an SHA-256 hash can be, in the same way that if the odometer on your car only has six digits, the maximum value it can display is 999,999. If there is a maximum size of a hash, it also means that there is a limited number of possible hashes.

For many of us, it’s hard to picture just how big 256 bits is. If you were to count up all of the possible hexadecimal SHA-256 hashes, you would arrive at the following number in decimal:

115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936

It’s a really big number, but it’s still a number we can write out and display in a line or two.

Now, let’s think about all of the possible different inputs we could run through SHA-256. Anything from “0” to “aardvark” to an entire dictionary or the complete works of Shakespeare. Really, the possibilities are infinite. This brings us to a problematic conclusion: Infinity is a whole lot bigger than the 256-bit number we wrote out above. According to the pigeonhole principle, the fact that we have infinite inputs but a limited number of possible hashes means that every single hash must actually have a lot of different inputs that can produce it.

Does this mean that everything we’ve talked about is built on a weak foundation, and that it’s all set to crumble at any moment?

Not necessarily.

You will have noticed that we never stated that a cryptographic hash function can never have two separate inputs that result in the same output. Instead, we stated that it’s not feasible (or basically impossible) to find two different inputs that result in the same output.

It doesn’t really matter if there are a bunch of different hash collisions that are theoretically possible. For the sake of security, the really important thing is that it just isn’t practical for anyone to find these hash collisions. Possible collisions can float around in the ether all they want, but they can’t cause us any problems if people can’t actually find them.

So the aim of a secure cryptographic hash function is to not be completely free from collisions. This would be a fool’s errand because the pigeonhole principle proves that it’s mathematically impossible. Instead, we just want cryptographic hash functions like SHA-256 to be collision-resistant, making it incredibly hard for anyone to find these collisions.

The birthday paradox

A birthday cake with candles by Annie Spratt licensed under CC0 1.0.

In order to determine how secure a hashing algorithm is, we need an understanding of just how likely collisions are. One of the major complications is that they are actually far more likely than your intuition would tell you. The best demonstration of this is the birthday paradox.

Let’s say you choose a random sample of people, and you are interested in finding people within that sample who share the same birthday. You would probably need more than a hundred to have a fighting chance of a match, right? After all, there are 365 days in a year.

Nope, with just 23 people, you already have a greater than 50% probability that two people within the group will share a birthday.

But how can this be?

You are probably thinking about it the same way most of us do, in that you are thinking of finding a match for a specific date, or a specific individual within the sample, rather than looking at it to see if any two people within the sample share a birthday.

Let’s demonstrate the difference between these two different ways of looking at it. If Jason was born on January 1, and you wanted to form a sample sufficiently large enough so that there would be a greater than 50 percent chance of someone sharing his birthday of January 1, then your intuition is right. You would want about 183 people in that sample for there to be a 50 percent chance of a match.

But we aren’t looking for a specific match in the birthday paradox, we are looking for any match. With 23 individuals, we actually have a lot of different possible match-ups. Not only are we just checking to see if Jason has the same birthday as any of the other 22 people, but also if each of them share the same birthday as any other person in the group. When you actually crunch the numbers, you arrive at the following:

(23 x 22)/2 = 253

So that is 253 match-ups from a group of 23 people. Given that there are only 365 days in a year, this number of match-ups brings us up to a chance that is far greater than 50 percent.

Sure, the birthday paradox doesn’t guarantee that anyone will have matching birthdays, just that the chances of matching birthdays occurring are much higher than we might have intuitively thought.

No, we didn’t just go off on a random tangent because we are birthday party enthusiasts. The birthday paradox relates directly to the likelihood of hash collisions.

If you shake up the metaphor a little and think about the probabilities of matching hashes occurring rather than matching birthdays, we realize that the chance of finding a collision in SHA-256 is not 1/115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 (our 256-bit number represented in decimal) as we might have expected.

The chance is actually much higher, because we are looking for any possible collision, not just a specific collision for a certain hash.

So what is the chance?

Without the birthday paradox coming into play, we would have expected to come across a matching hash after cycling through 50 percent of the total combinations. In shorthand, this is after attempting 2²⁵⁵ hashes.

With the birthday attack, we can use the following formula to figure out roughly how many hashes we would have to cycle through before arriving at the same probability of finding a match:

2^(m/2)

Where m = the bit size of the hash.

Therefore, when we run the numbers for SHA-256’s 256-bit hash length, we get:

2^(256/2)

2⁽¹²⁸⁾

So in SHA-256, the birthday attack has brought down the number of hashes to cycle through down from 2²⁵⁵ to 2¹²⁸. If you aren’t too familiar with this type of math, it’s easy to think that the birthday attack only cuts it down by half. In reality, the difference is much greater.

The following number is 2²⁵⁵ written in decimal notation:

57,896,044,618,658,097,711,785,492,504,343,953,926,634,992,332,820,282,019,728,792,003,956,564,819,968

And the following is 2¹²⁸

340,282,366,920,938,463,463,374,607,431,768,211,456

As you can see, there are magnitudes of difference between the two numbers, so the birthday attack makes it significantly easier for an attacker to find a matching hash than you may have expected.

With this in mind, the rule of thumb is that a 256-bit hash like SHA-256 is reduced down to 128 bits of security when considering birthday attacks rather than brute force. In the same way, MD5’s 128 bits are reduced down to 64 bits against birthday attacks.

What is a collision attack?

A collision attack is simply when an attacker finds one of these collisions, and uses it to undermine the security that the hash was assumed to provide.

Collision attack examples

Here are some common examples of collision attacks:

Freestart collision attacks

Freestart collision attacks are possible in hash functions that are based on Merkle-Damgard construction. This means that MD5, SHA-1 and SHA-2 are vulnerable. SHA-3, however, is not vulnerable due to its sponge construction.

Under normal circumstances, in the first round of a Merkle-Damgard hash function, the inputs are a set of predefined initialization vectors, as well as the first block of message data. You may want to refer to this article on the MD5 algorithm if you need some more background information on how hash functions work.

A freestart collision is different, because it involves allowing an attacker to choose their own initialization vectors. This allows them to reduce the security of the hash function. Why would anyone do that? Researchers do it in the lab, because it allows them to toy with hash functions and gives them greater insight into their weaknesses.

These aren’t practical attacks against the implementation of a secure hash function, because in reality, an attacker can’t choose the initialization vectors — which are ideally random or pseudorandom non-repeating numbers. However, freestart collision attacks are important for being able to understand the shortcomings of a given hash function, and for developing better cryptographic hash functions in the future.

Classical collision attack

In a classical collision attack, the aim is to find two different messages that both result in the same hash. If we want to get fancy and write it mathematically:

Message 1: x
Message 2: y
H(): The hash of ()

Therefore a classical collision attack involves an attacker finding two messages where message x does not equal message y, but the hash of x does equal the hash of y. In other words:

x ≠ y

But:

H(x) = H(y)

The chances of a successful collision attack are not only dependent on the size of the hash, but also on any possible weaknesses in the cryptographic hash function. As we discussed, even if a hash function has no known weaknesses, we still have to worry about birthday attacks, rather than just brute force attacks, which are much more difficult because they require an attacker to systematically run through every single combination until they find a collision.

The ramifications of classical collision attacks

When we look at the equation H(x) = H(y), it can be hard to see what this actually means for security. Let’s make it more concrete by giving you an example.

Let’s say Jane works at a company, and she wants to get rich quick. Let’s say that she gets payslips for $1,000 as PDFs each week. Each week, her boss digitally signs the paychecks so that payroll knows that they are legitimate.

The boss’ digital signature will have been based on a hash of the payslip pdf. So Jane takes the document for her $1,000 payslip and starts fiddling with it. First, she changes the $1,000 to $1,000,000. But when she hashes this document, the hash will no longer match the digital signature because the document has changed. If she tries to get payroll to pay it out, when they go to verify the signature, they will see that it doesn’t match and that the payslip is illegitimate.

But Jane’s too smart to have her ruse busted like that, so she gets back to fiddling with the document. She starts experimenting with different fonts, different sizes, changing the margins, changing the color, switching up the wording and a thousand more minor changes. She repeatedly hashes the document to see if the changes bring her a match. Eventually, after countless changes and a whole lot of coffee, a miracle happens, and the hash is exactly the same as the hash of the original $1,000 payslip.

All of these changes have been relatively minor. Even things that only marginally alter the appearance of the document will result in completely different hashes, so she has been able to construct a document that seems legitimate, but actually isn’t.

All Jane has to do is take this version of her $1,000,000 payslip to payroll alongside the original digital signature. Payroll may raise an eyebrow, but when they go to check the digital signature it all works out. It looks legitimate to them, and they aren’t paid enough to bother asking questions, so they approve Jane’s million dollar paycheck. By the time the accounting department figures out Jane’s cunning deception, she’s already on the beach sipping mojitos in a non-extradition country.

It’s a bit of an exaggerated example, but it demonstrates the underlying threat from classical collision attacks. When they are possible, they allow an attacker to undermine the integrity and authenticity of digital signatures, allowing them to cause all kinds of trouble like committing fraud. This is why our cryptographic hashing algorithms need to be resistant to collisions.

Chosen-prefix collision attacks

Chosen-prefix collision attacks are another type of collision which Merkle-Damgard hash functions are vulnerable to. They are significantly more challenging, but in circumstances where they are possible they pose a much greater threat than classical collision attacks do.

In contrast with the classical collision attacks that we described above, a chosen-prefix collision involves additional constraints. Rather than just being able to find any possible collisions, the attacker must be able to find collisions that involve two pre-specified prefixes.

The adversary is given two message prefixes, which basically means that they are given two separate strings of data. The adversary is then challenged to find a separate message for each prefix, where both prefix and message combinations share the same hash. We aren’t gonna lie, this can be a little hard to wrap your head around if you aren’t into cryptography.

In more formal notation, the prefixes are P and P’, while the messages are M and M’. So, given P and P’, the hacker must find messages M and M’ where:

H(P || M) = H(P’ || M’)

In the above equation, || represents concatenation, which basically means that the prefix and the message are joined on to each other.

The ramifications of chosen-prefix collision attacks

The result of a chosen-prefix collision attack is very similar to those of classical collision attacks. Essentially, an attacker can undermine authentication and integrity measures, allowing them to commit fraud and deception, just like how we demonstrated in the The ramifications of classical collision attacks section.

The main difference is that chosen-prefix collisions are much harder to achieve because of the additional constraints. However, when they are possible, it’s much easier for an attacker to construct documents that produce matching hashes.

Attackers don’t have to go to the effort of making endless changes in the hope of eventually coming up with a different document that produces the same hash. Instead, they can just create two different documents, pad them so they are the same length, and then add a specifically calculated string of data that results in the same hashes for both documents. This makes it significantly simpler to produce different documents that still have matching hashes.

Preimage attacks

Preimage attacks are related to collision attacks, but they involve trying to find messages that result in specific hashes. We will dive into these in more depth in a future article, but for now you should just be aware that:

A preimage attack — Involves taking a given hash, and then figuring out the initial input that produced it.
A second preimage attack — Involves taking a given input and finding another input where both inputs result in the same hash.

Collisions in MD5

MD5 is an old hash function that is no longer considered secure for many applications. Nevertheless, it’s still being used. In 2022, the French electricity provider Électricité de France (EDF) was fined €600,000 for violating the European Union General Data Protection Regulation (GDPR) requirements by storing passwords hashed with the MD5 algorithm.

MD5 — which has been around since 1991 — creates 128-bit hashes. However, when birthday attacks are considered, really means that it only has 64 bits of security. This proved to not be enough.

In the nineties, both a pseudo-collision and a semi free-start collision were discovered by separate researchers. While these were worrying indictments on the future security of MD5, neither were true collisions.

It wasn’t until 2004 that full collisions of MD5 were published by a team of academics. They were able to find MD5 collisions in under an hour on an IBM P690. This showed that MD5 was truly broken, and the world needed to rapidly move away from MD5 to prevent collision attacks from becoming widespread in the real world.

A chosen-prefix collision attack was first shown against MD5 in 2007. The researchers accomplished it with an expected cost of 2⁵⁰. This works out to somewhere in the realm of a thousand trillion attempts before a collision is expected. However, at the time, the practical applications of the attack were limited.

The researchers also examined classical collisions in MD5, and they found that the weaknesses in the hash function allowed them to construct two X.509 certificates with the same signature. This signified important real-world consequences, because it meant that we could no longer trust the authenticity of X.509 certificates that used MD5 hashes.

There have been a range of other collisions against MD5, both theoretical and practical. However, since the algorithm is already well and truly broken for many purposes, it’s not worth going into them.

Collisions in SHA-1

SHA-1 produces hashes that are 160 bits in length, which in practice gives it 80 bits of security against birthday attacks. In 2005, one team of academics found collisions in a 53 round reduced version of SHA-1. Full SHA-1 actually has 80 rounds, which is basically the number of times that the inputs get shuffled through the algorithm. While a 53 round collision is significant, it’s still a long way from a full 80 round collision.

Also that year, a separate team of researchers proposed an attack that found collisions in less than 2⁶⁹ operations (that’s 590,295,810,35 8,705,651,712 attempts), which was a significant improvement on brute force attacks. Their approach was built on a differential attack, alongside message modification and multi-block collision techniques.

By August of 2005, the attack had been improved to the point where collisions in SHA-1 could be found in 2⁶³ operations.

2006 saw a significant improvement on collisions for reduced round SHA-1, with a researcher publishing findings of a collision for 64 rounds. There was another jump in 2010, with another researcher bringing it up to 73 rounds. This was just another sign that SHA-1’s days were numbered.

In 2011, NIST deprecated SHA-1, indicating that it shouldn’t be implemented any more. In 2012, Marc Stevens estimated that a differential path attack could break a single SHA-1 hash at a cost of $2.77 million.

The first freestart collision against SHA-1 was found in 2015. The attack, dubbed with the creative name the SHAppening, only took 10 days on a 64-GPU cluster.

The SHAppening was followed by SHAttered, continuing the trend of cryptography puns. SHAttered occurred in 2017, and it was the first classical collision discovered for SHA-1. With a full collision so clearly demonstrated, it was abundantly clear that SHA-1 could no longer guarantee integrity and authentication for files or digital signatures.

In 2020, academics published a chosen-prefix collision attack against SHA-1. The paper’s title upheld the newly established tradition, and SHA-1 is a Shambles showed that it was now even easier for attackers to subvert integrity and authentication measures. Although SHA-1 had long been considered insecure, this was the first chosen-prefix collision revealed publicly.

Although SHA-1 is insecure for use in things like digital signatures, it’s still okay for it to be implemented in a hash-based message authentication code (HMAC).

Collisions in SHA-2

SHA-2 is currently considered the gold standard in cryptographic algorithms, and it’s implemented widely across our digital world.

The fact that it’s still considered secure should give you a hint that collisions haven’t made too much of a dent against it at this stage. If classical collisions or chosen-prefix collisions had been found, we would all need to be extremely worried about our online security.

While it’s still considered safe, cryptographers have been hard at work trying to break it. SHA-256 is a 256-bit algorithm, which gives it 128 bits of security against birthday attacks. SHA-512 has twice the bit length, giving it 256 bits of security against birthday attacks.

Back in 2008, the best reduced round attack could find collisions in 24 of SHA-256’s 80 rounds, and 24 of SHA-512’s 80 rounds. In 2011, a higher-order differential attack was capable of causing pseudo-collisions on 33 of SHA-256’s 64 rounds.

A paper in 2013 demonstrated collisions against 31 rounds of SHA-256, and semi-freestart collisions against 38 rounds. In 2014, a paper probing SHA-512 used branching heuristics in differential collision search, which resulted in a pseudo collision in 38 out of 80 rounds.

In 2016, researchers analyzed both SHA-256 and SHA-512. They found practical collisions against 28 out SHA-256’s 64 rounds, and collisions against 27 of SHA-512’s 80 rounds. They also found a pseudo-collision against 39 rounds of SHA-512.

Each of these attacks are still far away from posing meaningful threats to the security of SHA-2. At this stage, we don’t have to worry too much about collisions against SHA-2. However, it’s still crucial for researchers to continue probing SHA-2 and looking for weaknesses. If they don’t, nation states and organized criminals still will, and they won’t tell us when they manage to find feasible collisions.

Collisions in SHA-3

SHA-3 is the latest in the line of Secure Hash Algorithms. We don’t see SHA-3 implemented too often, because SHA-2 is still considered secure, and there isn’t much point in spending the effort to move over at this stage.

Although it is newer, it isn’t necessarily more secure than SHA-2. The two feature dramatically different internal designs, and SHA-3 hasn’t been studied as much as SHA-2 because of its relative recency. Rather than thinking of it as an upgrade to SHA-2, it’s better to think of it as a spare, because we really don’t know which algorithm will stand the test of time.

However, the benefit of having two completely different cryptographic hash algorithms is that if an attack is found against either one of them, it’s assumed that the other will still be resistant. This means that we could quickly switch over to the other one if need be.

Now that you understand the role of SHA-3, we can take a look at some of the best attacks against it. In 2012, a paper was published which showed collisions in up to five rounds of the algorithm that would go on to become SHA-3. The researchers used generalized internal differentials to target several versions of the algorithm, known as Keccak.

The researchers found collisions for three-round versions of both Keccak-384 and Keccak-512. They also found an attack that was 2⁴⁵ times faster than a birthday attack against 4-round Keccak-384. Against Keccak-256, they managed to find collisions for five rounds. However, these attempts were far away from posing a threat to the algorithms, which each have 24 rounds.

By 2019, the SHA-3 algorithm had been finalized, and a group of academics published another paper on attacks against it. The differences between Keccak and the finalized version of SHA-3 are so marginal that these two security analyses were essentially probing the same thing.

This second paper was built on the work of the earlier researchers, and this time the academics were able to achieve collisions against six rounds of SHA-3. While this jump showed some progress, it’s still a long way from posing a threat to the full 24 rounds of SHA-3.

Securing your systems against collision attacks

The average Joe can’t do too much to secure themselves against collisions, apart from basic stuff like updating to the latest versions to make sure that they aren’t using vulnerable software from 2002.

Developers need to ensure that they are only using cryptographic hash algorithms that are secure for their purpose. In the case of things like digital signatures, they must be using algorithms like SHA-2 or SHA-3 to ensure that signatures can be trusted for integrity and authentication.

Apart from that, developers need to check in every few years to keep up with the security status of whichever algorithms they use. If attacks against the hashing algorithm they implement are getting more severe, then they need to begin the process of switching over. If they don’t, their systems could be vulnerable to collision attacks and the fraud that surrounds them.

What is a collision attack? Threats to digital signature security