SHA1 hashes are fairly long, a string of 40 hex characters (160 bits), so Mercurial and Git allow using a prefix of that, as long as the prefix is unambiguous. Mercurial also typically only shows the first 12 characters (let’s call them short hashes), for instance:
And those are the hashes most Mercurial users use, for instance they are posted in Bugzilla whenever we land a patch etc.
Collisions with short hashes are much more likely than full SHA1 collisions, because the short hashes are only 48 bits long. As the Mercurial FAQ states, such collisions don’t really matter, because Mercurial will check if the hash is unambiguous and if it’s not it will require more than 12 characters.
So, short hash collisions are not the end of the world, but they are inconvenient because the standard 12-chars hg commit ids will become ambiguous and unusable. Fortunately, the mozilla-central repository at this point does not contain any short hash collisions (it has about 242,000 commits).
Finding short-hash collisions
I’ve wondered for a while, can we create a commit that has the same short hash as another commit in the repository?
A brute force attack that works by committing and then reverting changes to the repository should work, but it’d be super slow. I haven’t tried it, but it’d probably take years to find a collision. Fortunately, there’s a much faster way to brute force this. Mercurial computes the commit id/hash like this:
Here p1 and p2 are the hashes of the parent commits, or a null hash (all zeroes) if there’s only one parent. To see what contents is, we can use the hg debugdata command:
Perfect! This contains the commit message, so all we have to do is append some random data to the commit message, compute the (short) hash, check if there’s a collision and repeat until we find a match.
After about 2 minutes it’s done and tells us we have to append “1631965792_24262171” to our commit message to get a collision! Let’s try it (we have to be careful to preserve the original date/time, or we’ll get a different hash):
Voilà! We successfully created a Mercurial short hash collision!
And no, I didn’t use this on any patches I pushed to mozilla-central..
The Rust source code is available here. It was my first, quick-and-dirty Rust program but writing it was a nice way to get more familiar with the language. I used the rust-crypto crate to calculate SHA1 hashes, installing and using it was much easier than I expected. Pretty nice experience.
The program can check about 100 million hashes in one minute on my laptop. It usually takes about 1-5 minutes to find a collision, this also depends on the size of the repository (mozilla-central has about 242,000 commits). It’d be easy to use multiple threads (you can also just use X processes though) and there are probably a lot of other ways to improve it. For this experiment it was good and fast enough to get the job done :)