FS#12772 - makepkg: change INTEGRITY_CHECK default to SHA-256 instead of MD5
Attached to Project:
Pacman
Opened by Aaron Bull Schaefer (elasticdog) - Tuesday, 13 January 2009, 02:57 GMT
Last edited by Allan McRae (Allan) - Friday, 07 August 2009, 06:35 GMT
Opened by Aaron Bull Schaefer (elasticdog) - Tuesday, 13 January 2009, 02:57 GMT
Last edited by Allan McRae (Allan) - Friday, 07 August 2009, 06:35 GMT
|
Details
As discussed on the arch-general mailing list here:
http://is.gd/fCWN, I would suggest updating the default configuration for
makepkg to use SHA-256 checksums instead of MD5. Because of
the insecurity of MD5, it is possible for a malicious user,
or upstream provider to create a package that could
potentially harm a user's systems. Switching to a stronger
hash solves this particular issue. For more details, see the
mailing list thread posted above.
I have attached a patch that will fix this issue, and it should coincide with the pacman v3.0 release. |
This task depends upon
I'm not sure what this is trying to solve. I can see two cases:
1) The source file is tampered with before we make a PKGBUILD for it. In that case we are screwed no matter the checksum.
2) The source file is tampered with after we make a PKGBUILD for it. In that case, for it to be an issue a md5sum collision must be found, it must be a valid archive to be extracted by bsdtar, and the PKGBUILD must still build a binary with the actual exploit in it.
So we are guarding against the second case which is likely to be rare.
And when md5sums are different from the downloaded source, the likelihood that it is a packaging error is far higher than it being a hacked source package. And it does happen:
FS#416,FS#2628,FS#3195from the first page of a google search...As far as the two cases go, that is somewhat correct, however it is a possibility that a developer could plan on releasing malicious software from the get go, and we'd still have problems even in your first use case. If they created some new software that a user packages up, it could work fine for some time, and then the developer could simply replace the source with the malicious source that has a MD5 checksum collision with the original working code. And the malicious code wouldn't have to be an obvious "rm -rf /" type of blow-your-computer-up issue, but could provide a quiet back door to the developer, establish a bot-net, etc.
It has shown to be done...not with one of our PKGBUILDs in particular, but out in the wild. I'm not saying it is trivially easy, but it is absolutely possible. Not to mention the HUGE Certificate Authority spoofing that happened recently due to this very issue (http://www.crunchgear.com/2008/12/30/md5-collision-creates-rogue-certificate-authority/). With its understood weaknesses, MD5 allows two completely ARBITRARY files to have the same MD5 hash by appending a few thousand bytes at the end of each file (see links posted in the ML).
Of course it does happen that developers forget to update the checksums, but that IS the responsibility of the developers to stay on top of that. It's great that we've had bug reports going back to 2004 regarding that issue, but that just means users are paying attention to the verification process failing (as they should).
This recommendation simply eliminates the risk for either of the above scenarios, which ARE realistic, no matter how rare they may be. I'd rather be sure that when I see a checksum failure, it IS because a developer forgot to update the hash, and not because of a compromised machine upstream.
The patch will cause no problems for current or future users. Even if you think the patch is protecting against a situation that would never come up, where's the harm in using a more secure hash anyway, just to be sure?
If we suppose all users are not going to update their makepkg.conf and keep the old one :
INTEGRITY_CHECK=(md5)
Then it should be fine. Pkgbuilds will either have md5 or md5 + sha256 so no missing checksums so no warning.
However, if the majority of users upgrade their config file (which is usually a good thing to do), they will have :
INTEGRITY_CHECK=(sha256)
And thus get a warning for all the old pkgbuilds which don't have a sha256sum.
On the other hand, I am not sure redoing the entire pkgbuild repo is worth it.
So the only way is probably to teach as good as possible the users that this warning is expected.
Aaron thought that if we have INTEGRITY_CHECK=(md5 sha256), then the PKGBUILD only needs one of them for not causing warnings.
But if I read the code correctly, the PKGBUILD needs both md5 and sha256 for not displaying any warnings.
So users would still get warnings with all the old PKGBUILDs who only have md5.
If we don't want to confuse users with warnings, I see two ways out :
1) implement Aaron's way : don't display any warnings as soon as we have at least one required checksum. And switch to INTEGRITY_CHECK=(md5 sha256) as default
2) getting rid of that warning completely.
What that means is that users won't get a warning on old PKGBUILD files, and you won't have to spend the time generating multiple checksums or updating everything at the same time. Once pacman v3.0 is released, makepkg will verify any checksums that are in the PKGBUILD regardless of what INTEGRITY_CHECK is set to. It will only fail if there are no checksums, or one of the provided checksums fail. There will be no "expected warnings".
http://projects.archlinux.org/?p=pacman.git;a=commitdiff;h=496b687c3d4a56641051700e7c4e5e21b9c6607c;hp=baf58525553db8f1e680de16793b147c88df59e2
Xavier is 100% correct here. If you apply the patch from the first comment, you will get a warning for every single PKGBUILD not containing sha256sums.
Here is a patch that updates makepkg to only use INTEGRITY_CHECK on generation of checksums and no longer uses it on verification. It will error out if there are no checksums present (or if a checksum fails). It will also error out if the given checksum list is incomplete. If there are multiple checksum lists given, one incomplete and the other complete, it will give a warning about the incomplete list and continue building the package regardless (assuming the completed list verifies all files). If there are multiple complete lists, it will check them all, as expected.
1) Not sure if you munged the blank line between the patch title and the "Signed-off-by", but that messes up git-am.
2) If an array is wrong in any way, it makes more sense to fail. Incomplete arrays are just as incorrect as incorrect checksums- I'm not sure why we would want to ever permit this. I've changed it to do this.
Looking at the modified patch, getting rid of my elif means that it will error out when no checksums for a particular type are provided. So even if there are valid md5sums, it will go to the next type and give the error: "Integrity checks (sha1) differ in size from the source array." because there are none provided. That's why I tested to make sure [ ${#integrity_sums[@]} -gt 0 ].
Aside from that though, I'm fine with erroring out in the case of one incomplete array and one complete array. It does make more sense to ensure that all checksums are present and up to date.
2) As Aaron stated, there are two cases. In my mind, case 1 is the *much* easier avenue of attack, as we are trusting that we got a valid source tarball. Going along with this line of thinking, I've yet to see upstream developers giving out the sha256sums for their software. You are much more likely to see sha1sums, which we could directly verify with makepkg. To me, it seems like this is the better precedent to be following- use upstream provided sha1sums that are hopefully stored in a different place than the source itself, and actually do source verification instead of just a makepkg -g >> PKGBUILD.
I agree that using upstream-provided checksums is the best practice (no matter what hash function they provide), but keep in mind that SHA-1 is broken too, just not as severely as MD5...yet. I'd personally use a provided MD5 or SHA-1 for initial verification, but after testing the package, generate a more secure hash to help protect our users. It wouldn't necessarily prevent the first malicious scenario, but it would prevent the second; and hopefully they're both rare enough (and our testing thorough enough) that it won't ever be an issue. My whole point is covering our own ass with the code we actually control by picking sane defaults.
Did it happen?
BTW i made small benchmark of common hashing alghoritms that i found in my Arch Linux:
for i in md5sum sha1sum sha224sum sha256sum sha384sum sha512sum shasum; do echo $i:; time $i 698MB_big_file.avi; echo; done;
md5sum:
real 0m4.609s
user 0m2.530s
sys 0m0.593s
sha1sum:
real 0m14.013s
user 0m8.626s
sys 0m0.633s
sha224sum:
real 0m13.595s
user 0m8.936s
sys 0m0.727s
sha256sum:
real 0m13.609s
user 0m8.949s
sys 0m0.683s
sha384sum:
real 0m49.359s
user 0m30.675s
sys 0m1.017s
sha512sum:
real 0m47.823s
user 0m30.978s
sys 0m1.007s
shasum:
real 0m9.819s
user 0m6.276s
sys 0m0.617s