Back to blog

Understanding Hashing Algorithms

Blog

Once upon a time there was an IT Industry who collectively developed the practice of storing passwords in plain text in database servers locked away in server rooms and data centres. Every day these usernames and passwords were synchronised between databases to provide a single-sign on capability and an assurance that IT Failure wouldn’t be the result of lost data. Every day users reused these passwords again and again over many different services, which was assumed to be largely acceptable as long as the most critical services had a different password. At the time this was a perfectly acceptable practice as, in reality, those databases were only used internally within networks or buried so deeply in an organisations architecture that there was minimal risk of these passwords being accidentally exposed.

One day one of these databases was leaked and the passwords became public knowledge. These plain text usernames and passwords could be used to log into a variety of services from simply taking over an individual’s social media to stealing their entire savings account. It was argued that these users should have been more disciplined about reusing passwords, the courts however fundamentally disagreed and ordered those responsible for the password breaches to pay compensation to the affected users for breach of trust.

Thus, the IT Industry developed the practice of applying a one-way hash function to passwords and storing the resulting output, rather than the actual password itself. It is very difficult to reverse engineer passwords from the hash, making this practice an adequate way of storing enough information to validate a password but not enough information to use the hash to authenticate:

Due to password hashing, hackers developed Rainbow Tables and Preimage Attacks which enabled quick lookup of hashes, effectively rendering 80% of passwords vulnerable once more. As a result, we started adding a salt to every password, storing it alongside each one, to make it significantly harder to reverse engineer a password:

However, with the advent of clusters of inexpensive powerful computers, hackers developed ways of generating rainbow tables on the fly and cracking each password in turn based upon the hash stored alongside the password, in effect making those passwords vulnerable once again. In some instances, it was possible to “Pepper” the input with random data stored in another location. Nevertheless, this typically involved using a Hardware Security Module to store this Pepper and perform the hash using it – which proved incredibly expensive:

Until finally someone realised a way in which we could make a hackers’ work very difficult. For example, on a half decent laptop we can generate 1 million MD5 hashes in 1 second, meaning it would only take 8 minutes and 20 seconds to hash the 300 million known weak passwords. If, however, we recursively hash the output of the hash, say half a million times, it would take half a second to generate each hash, meaning 300 million known passwords would take nearly 5 years to complete:

What Hashing Function Should be Used

Most hashing functions invented have been subsequently found to be weak against specific vectors:

  • MD1-4 – Considered obsolete in 1995
  • MD5, SHA-0 – Considered obsolete in 2004
  • SHA-1 – Considered obsolete in 2017
  • SHA-2 – Minor weaknesses discovered in 2013
  • SHA-3 – CPU bound, thus considered weak to ASICs
  • Bcrypt – CPU bound, thus considered weak to ASICs
  • PBKDF2 – CPU bound, thus considered weak to ASICs
  • Scrypt – Memory bound, thus not weak to ASICs
  • Argon2 – Memory bound, thus not weak to ASICs

Obsolete Functions

These functions have been found to be weak to specific attacks such as collision or preimage attacks and should never be used to store passwords. If passwords are currently hashed using these functions the best course of action is to clear the password and ask the user to reset their password before using the service again.

Weak Functions

Functions in the SHA-2 family may be susceptible to preimage or collision attacks, however this is only theoretical at the moment. The SHA-2 family should not be used for new passwords and passwords should be upgraded to better hashing functions when users next login, this can be a transparent operation.

CPU Bound Functions

Functions that implement a derivation mechanism that is based upon using CPU cycles are susceptible to both Moore’s Law and Application Specific Integrated Circuits, which means it is becoming increasingly inexpensive to generate billions of hashes every second. The development of ASICs has largely been driven by Crypto Currency which has made it very profitable to build powerful ASICs to mine BitCoin, Etherium, Ripple, etc. Practically most Identity Platforms only implement hashing functions that are CPU bound, not Memory Bound, thus the majority of solutions in 2017 should implement high-Bcrypt or PBKDF2.

Bcrypt

BCrypt is considered a CPU bound strong algorithm that is resistant to ASICs, however given its use with a high entropy salt and a cost ≥ 12 this is a very respectable algorithm to implement within an Identity Solution. The following benchmarks were completed on a Processor Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz:

PBKDF2

PBKDF2 is considered a CPU bound strong algorithm that is resistant to ASICs, however given a high entropy salt and Iteration Count ≥ 100,000 it is considered a reasonable algorithm to implement within an identity solution. The following benchmarks were completed on a Processor Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz:

Memory Bound Functions

Functions such as Scrypt and Argon2 have been designed to consume a great deal of memory and as such they are not good candidates for building ASICs as memory is still relatively expensive in terms of up-front and running costs, making it commercially unsustainable to build them for the purposes of CryptoCoin mining let alone for cracking passwords.

Conclusion

It’s no longer viable to store passwords in Plain Text. The ways of encoding highly restricted data must keep up with the times and increase the protection provided for individuals. Many of the mechanisms that were once considered strong have been broken or are considered weak, and there is significant threat from highly-accessible, inexpensive compute in the form of the public cloud and ASICs. It is, however, possible to achieve a good level of security by using strong hashing (or Key Derivation Functions) such as BCrypt and PBKDF2 with a high CPU cost to protect user information in the face of data leakage.

Leave a Reply

Your email address will not be published. Required fields are marked *

The browser you're using is out of date. Please update for better security, speed and experience on this site.