Photo by Tim Evans on Unsplash
As a backend developer you will more than likely have to handle user logins and sign up at some point in time, if you haven't already, which means you will need to store the users' passwords.
The passwords users give you are generally not just for your website, but the same password they use for just about everything. This is important to understand as it will hopefully give you a better idea of just how important it is to keep your users' passwords safe.
But people don't give out their passwords to just anyone, especially not strangers, yet they should trust you?
Your website is not special
According to a survey in 2018 by LastPass/LogMeIn & Lab42, regarding people's password habits, the number of people who reuse their passwords across services/websites is nearly 60%. That is 1200 people out of the 2000 people they interviewed!
Now this is where it gets even more scary: 62% reuse the same passwords for both their personal and their work accounts. The survey also points out 79% have anywhere between 1 and 20 online accounts for work and personal use. So with a single breach, a hacker could potentially have access to all 1 - 20 other accounts of 60% of your users.
So according to the survey, 60% of your users are reusing the same password on other websites - your website is not special.
If a hacker manages to steal all of your account information, and you are not hashing their passwords properly, you negligence could cause potential financial, emotional and possibly even personal damage.
A recent example of a "reused password attack", where user credentials from other breaches was used to gain access to another service using the same login details, would be the GitHub incident in 2016, or the 2012 hack of LinkedIn which resulted in also Dropbox being compromised.
Queue the "Buts"
I have sadly more than often heard the following excuses from developers and companies, trying to justify their poor, or outright missing (plaintext) implementations:
- My website is small/unknown, it won't attract any hackers.
- We store nothing which is of interest to hackers.
- It's just a temporary solution until x.
- We are a non-profit/charity, they would never...
- Yes I know, I'll sort it later™
If you have any information (a login shows you at least would have login details), you could easily become a target; heck, you don't even need to have any information you could simply just "exist" as it's not always about data or even profit.
Even if you are a charity or similar, don't think that will deter hackers from attacking your site. Hackers do have morals, but you will rarely see it get in their way.
As someone who have been on that side, trust me, hackers do not care and they do not discriminate.
Why SHA* is not enough
SHA (Secure Hash Algorithm) is not meant to be used for passwords. They are fast hashing algorithms which are more suited for generating data "signatures".
For example, you can generate a signature of a file and if any part of the file is changed the signature would no longer match and you would know the file has been tampered with or altered in some way.
Don't think adding a salt to a password hashed with SHA* will do any good in the grand scheme of things. It won't slow down the rate at which the password could be cracked (which can easily be in the billions or even trillions of hashes per second), at best it could maybe, slightly, increase the time between the actual breach and when the password cracking could begin.
What you need, is an algorithm which has a "cost" associated with it, and by cost I mean CPU time and Memory.
Not all hashes are made equal
This is not to say some hashes are better than others in every way, but some are definitely better than others at specific tasks.
For example, if you wanted to verify data, SHA* would be much preferred over something like bcrypt, due to performance and resource requirements; and funnily enough, the same reason you should use SHA* in for verifying data, is the same reason you shouldn't use it for passwords.
Instead of SHA*, something like scrypt or bcrypt would be better (but still not recommended), because you can increase the resource cost to generate a hash which would ultimately increase the time it would take to crack the password.
To give you an example of the difference in cracking speed between bcrypt and SHA512, I have included the results from a blog post from netmux.com, where they create a password cracking rig and ran some benchmarks:
SHA512: 3235.1 MH/s
SHA-3(Keccak): 2500.4 MH/s
BCrypt, Blowfish(OpenBSD): 43551 H/s
(Notice the SHA* results is in mega hashes per second, which means 106 or 1,000,000 hashes per second!)
If we go by these results, it would take about ~5 minutes to test a password hashed with SHA512 against 1 trillion (1,000,000,000,000) different values, but with bcrypt it would take a little over 265 days!
As you can see, being able to slow down the cracking speed is hugely important and can very quickly make it unfeasible to even try to crack the passwords.
But don't just use bcrypt and call it a day, because here are the main reasons bcrypt won't hold up in todays environment: Lack of memory-hardness.
Bcrypt's lack of memory-hardness is a serious problem when you consider how much power "off the shelf" hardware have today, and how quickly new and more powerful hardware is released.
There was even a paper published back in 2014 describing cracking bcrypt with low-cost energy efficient parallel hardware, and that was in 2014!
Although bcrypt still is considered pretty decent, the future of bcrypt does not look good, and it is recommended you look into a new and more future proof algorithm.
The answer: Argon2
Argon2 is the winner of the Password Hashing Competition in July 2015, and with good reason. Argon2 supports not only memory-hardness and CPU-hardness options but also parallelism factor, which can make it highly resistant to offline cracking. Even since winning the PHC in July 2015, Argon2 have withstood scrutiny very well.
But what is all this memory-hardness and CPU-hardness about and how does it help?
What CPU-hardness, memory-hardness and parallelism factor means is you can tweak the cost of computing the hash.
For example, let's say you hashed your password with the cost of "100" (pseudo, for the sake of simplifying the example), your password hash is then stolen by a hacker who then wants to try and crack it. The hacker would need to specify the exact same cost value(s) to be able to generate the same hash. This means you are forcing the hacker to dedicate a lot more time and resources (money) to cracking the password, making it potentially unfeasible to do (still assuming you used a strong unique password).
Now imagine a database with thousands or even millions of users, all hashed using Argon2 and with some reasonable cost, it would significantly decrease the number of tests a hacker could run per second.
In the article Password Hashing Status by George Hatzivasilis (section 5.1), he talks about the GPU calculation speeds of multiple different algorithms (lower is better in this case). SHA1, SHA2 and SHA3 could achieve 794-113 MH/s (Mega Hashes), bcrypt could achieve 2868-2.71 H/s but Argon2 was able to achieve just 2.64-0.04 H/s!
Where to get started with Argon2
There is a large number of bindings for many popular languages, and I wouldn't be surprised if there are other bindings as well which are just not mentioned, on their GitHub repository.
The implementation should be is fairly easy, as all you will really need is the .hash("password", {...options})
and .verify(<hash>, "password")
functions (JavaScript in this example).
Just make sure you tweak the options so suit you needs, with a decent rule of thumb would be aiming for a hashing speed of 200ms/hash on the targeted hardware, with a parallelisation, memory and CPU cost as high as you can comfortably afford. Don't go too crazy, as it could very quickly turn around and bite you as your website/app gets more popular.
Closing word
It is important to note, I wanted to focus on just the hashing part, as I see it being one of the easier things to fix quickly, and with good results. But password security includes far more than just which hashing algorithm to use. There are many more layers to it and I can recommend reading the excellent article "Passwords Evolved: Authentication Guidance for the Modern Era" by Troy Hunt.
Also make sure you read about the 3 different variants for Argon2: Argon2i, Argon2d, and Argon2id. You should understand the difference between the 3, to make sure you choose the right one for your needs. If you are not sure which one to use, I would recommend the (id) variant, which on is a combination of the 2 other variants, which provides some of the resistance to side-channel cache timing attacks of the "i" variant and a lot of the resistance to GPU cracking attacks from the "d" variant.
I tried to keep things as simple as I could, avoiding going into too much detail about the inner workings of password cracking and hashing, as I don't want anyone new to the area or without much experience to give up half way through.
If there are parts you think I jumped over a bit too quickly, please let me know, I would be happy to elaborate where needed.
A big thanks to BinaryEvolved and Polynomial respectively, for their posts on the information security StackExchange website, regarding Argon2. Their posts helped me to better understand some of the security aspects of Argon2 and point to other resources with more in-depth details.