How websites store data
When you create an account on a website, the website stores your registration details on it’s SQL databases. Very few people, even within the company/website have direct access to the databases.
Those who don’t know about hashing may wonder how does the website check if you are typing the correct password during login, if the site itself doesn’t know you password. Well, to understand that, you must understand what hashing is. You can read it up on wikipedia for a technical idea, but I’ll (grossly over-)simplify it for you.
Let’s say your password is “pass”, and there’s a hashing function f(x). Then,
f(“pass”) = d@A2qAawqq21109 (say).
Going the forward way is quite simple. On the other hand, figuring out the plain-text password from the hash (d@A2qAawqq21109) is almost impossible.
So, when you create an account and you type the password as “pass”, d@A2qAawqq21109 is stored in the database.When you login and type password as “pass”, the server hashes it, and it becomes “d@A2qAawqq21109”, which is matched with the SQL database. If you typed out some other password, say “ssap”, then the hash generated would be different, and you won’t be able to log in. Note that while the hashing function gives different outputs for most strings, every once in a while, there may be collisions (two strings may have the same hash). This is very very very rare, and shouldn’t be of any concern to us.
Forgot Your Password – Ever wondered why almost all websites give you a new password when you forget your old one, instead of just telling you your password. Well, now you know, it turns out that they themselves don’t know your password, and hence can’t tell you. When they offer you a chance to change your password, they just change the corresponding hash in their tables, and now your new password works.
How hashes are cracked – I wrote earlier that hash functions are easy to go one way, but almost impossible to go the other. The task of going the other way can be accomplished by bruteforce method. Basically, suppose someone had the password “pass”. Now, a hacker who only has access to the hashes can hash all the passwords in alphabetical order and then check which hash matches. (assume hacker knows password has length four and only alphabets).
He tries ‘aaaa’,’aaab’, ‘aaac’,……’aaba’, ‘aabb’ ,’aabc’,…..’aazz’ , ‘abaa’, ……………. ‘paaa’,’paab’,.. ,’pass’. When he tries ‘aaaa’, the hash is not d@A2qAawqq21109, it is something else. Till he reaches ‘pass’, he gets a hash which doesn’t match d@A2qAawqq21109. But for ‘pass’, the hash matches. So, the hacker now knows your password.
This isn’t even the worst part though. Some websites don’t hash your passwords, and store them in plain-text instead. If their database is leaked, the hacker has immediate access to millions of accounts on that website, plus possibly 10s of millions of accounts on other websites which use the same email/username – password combination.For example, 000webhost database had plain-text passwords, and it was leaked. I personally hosted a site there once, and my account was compromised as well.
Problem 1 : Suppose there’s an hashing scheme X. Under that scheme, “pass” becomes d@A2qAawqq21109. Now this is a very secure scheme and every website uses it. Now, there’a guy who has a lot of computational power and he computes the hashes of all possible letter combinations under the scheme X. Now, given a hashed value, he can simply lookup/search his table and see what password does it correspond to. He makes this table of word to hash available online. Now, it’s quite easy to get the passwords from a database dump.
Problem 2 : Alternatively, even if the scheme isn’t common, what one can do is that he can take a common password, say “password”, then hash it, and then search all the users in the 100 million users password dump and see if any hash matches. If it does, then that means that the given user has the password “password”. By using 1 million common password, he’ll probably get 10% of the users password among the 100 million users.
- The first problem where someone else pre-computed the password-hash table is solved, since now that person has to make password-salt-hash table (for every password and every salt combination, what’s the hash), which is going to be too many possible combinations. If there are 10 million possible passwords, and 10 million possible salts, there would be 100 million million combinations (I don’t even know what million million even is). If there are 10 common salts which are used very often, then the person can make a table with all the 10 million passwords hashed for the 10 common salts. Alternatively, the person can hash the 10 most common password with 10 million possible hashes. Thus, it’s important to have both strong passwords and random salts.
- The second problem is also kind of solved, since the person would have to solve the hash of common passwords with each salt in the table (note that he doesn’t have to do it for all 10 million combinations, only the ones present in the table). Again, not using easy generic password like “password”,”hello”, etc. would solve this issue.
Out of all the leaks so far, I had accounts in 4 of the leaks. My account was there in the Myspace leak, the LinkedIn leak, the dropbox leak, and the 000webhost leak. I had to change my password on multiple sites on multiple occasions.
I am compromised
If you find out that your account is indeed compromised, then I suggest you quickly change your password on all services that you use which have the same password. Better yet, change all your passwords. It’s good practice to keep changing your passwords regularly anyway. Also, if a website has the two step authentication feature, then it’s suggested that you use it.