FS#12772 - makepkg: change INTEGRITY_CHECK default to SHA-256 instead of MD5

Attached to Project: Pacman
Opened by Aaron Bull Schaefer (elasticdog) - Tuesday, 13 January 2009, 02:57 GMT
Last edited by Allan McRae (Allan) - Friday, 07 August 2009, 06:35 GMT
Task Type Feature Request
Category makepkg
Status Closed
Assigned To No-one
Architecture All
Severity Medium
Priority Normal
Reported Version 3.2.2
Due in Version 3.3.0
Due Date Undecided
Percent Complete 100%
Votes 4
Private No

Details

As discussed on the arch-general mailing list here: http://is.gd/fCWN, I would suggest updating the default configuration for makepkg to use SHA-256 checksums instead of MD5. Because of the insecurity of MD5, it is possible for a malicious user, or upstream provider to create a package that could potentially harm a user's systems. Switching to a stronger hash solves this particular issue. For more details, see the mailing list thread posted above.

I have attached a patch that will fix this issue, and it should coincide with the pacman v3.0 release.
This task depends upon

Closed by  Allan McRae (Allan)
Friday, 07 August 2009, 06:35 GMT
Reason for closing:  Won't implement
Comment by Aaron Bull Schaefer (elasticdog) - Tuesday, 13 January 2009, 03:02 GMT
I don't think the patch was attached with the original report, so here it is...
Comment by Allan McRae (Allan) - Tuesday, 13 January 2009, 11:32 GMT
Being ever so slightly facetious, why not sha512sums?

I'm not sure what this is trying to solve. I can see two cases:

1) The source file is tampered with before we make a PKGBUILD for it. In that case we are screwed no matter the checksum.

2) The source file is tampered with after we make a PKGBUILD for it. In that case, for it to be an issue a md5sum collision must be found, it must be a valid archive to be extracted by bsdtar, and the PKGBUILD must still build a binary with the actual exploit in it.

So we are guarding against the second case which is likely to be rare.

And when md5sums are different from the downloaded source, the likelihood that it is a packaging error is far higher than it being a hacked source package. And it does happen:  FS#416 ,  FS#2628 ,  FS#3195  from the first page of a google search...
Comment by Aaron Bull Schaefer (elasticdog) - Tuesday, 13 January 2009, 16:19 GMT
Why not jump to SHA-512? Well, SHA-256 is the fastest NIST-standard hash that has no known vulnerabilities and is the current recommendation for secure transmission today (see FIPS 180-2 and FIPS 180-3). It uses 512 bit blocks with 32 bit words rather than SHA-512's (and SHA-384's) 1024 bit blocks with 64 bit words (SHA-384 is the same hash as SHA-512, just truncated after calculation), so in most cases it will run faster, and it will be a shorter hash for the use in PKGBUILD files.

As far as the two cases go, that is somewhat correct, however it is a possibility that a developer could plan on releasing malicious software from the get go, and we'd still have problems even in your first use case. If they created some new software that a user packages up, it could work fine for some time, and then the developer could simply replace the source with the malicious source that has a MD5 checksum collision with the original working code. And the malicious code wouldn't have to be an obvious "rm -rf /" type of blow-your-computer-up issue, but could provide a quiet back door to the developer, establish a bot-net, etc.

It has shown to be done...not with one of our PKGBUILDs in particular, but out in the wild. I'm not saying it is trivially easy, but it is absolutely possible. Not to mention the HUGE Certificate Authority spoofing that happened recently due to this very issue (http://www.crunchgear.com/2008/12/30/md5-collision-creates-rogue-certificate-authority/). With its understood weaknesses, MD5 allows two completely ARBITRARY files to have the same MD5 hash by appending a few thousand bytes at the end of each file (see links posted in the ML).

Of course it does happen that developers forget to update the checksums, but that IS the responsibility of the developers to stay on top of that. It's great that we've had bug reports going back to 2004 regarding that issue, but that just means users are paying attention to the verification process failing (as they should).

This recommendation simply eliminates the risk for either of the above scenarios, which ARE realistic, no matter how rare they may be. I'd rather be sure that when I see a checksum failure, it IS because a developer forgot to update the hash, and not because of a compromised machine upstream.

The patch will cause no problems for current or future users. Even if you think the patch is protecting against a situation that would never come up, where's the harm in using a more secure hash anyway, just to be sure?
Comment by Aaron Griffin (phrakture) - Tuesday, 13 January 2009, 17:31 GMT
I think it might be a good idea to keep md5 in the array and just append another checksum. This should work better with existing md5sums too so we won't have to completely redo the entire package repo (abs users are going to complain about missing checksums, I'm sure)
Comment by Xavier (shining) - Tuesday, 13 January 2009, 18:02 GMT
After thinking a bit and hoping I understood everything I correctly, I am afraid this won't solve anything.

If we suppose all users are not going to update their makepkg.conf and keep the old one :
INTEGRITY_CHECK=(md5)
Then it should be fine. Pkgbuilds will either have md5 or md5 + sha256 so no missing checksums so no warning.

However, if the majority of users upgrade their config file (which is usually a good thing to do), they will have :
INTEGRITY_CHECK=(sha256)
And thus get a warning for all the old pkgbuilds which don't have a sha256sum.

On the other hand, I am not sure redoing the entire pkgbuild repo is worth it.
So the only way is probably to teach as good as possible the users that this warning is expected.
Comment by Aaron Griffin (phrakture) - Tuesday, 13 January 2009, 18:16 GMT
But then we run into the realm of "expected warnings" which people start to ignore... which is the opposite of what we want, isn't it?
Comment by Xavier (shining) - Tuesday, 13 January 2009, 18:44 GMT
I didn't understand Aaron's comment, but he clarified that to me.
Aaron thought that if we have INTEGRITY_CHECK=(md5 sha256), then the PKGBUILD only needs one of them for not causing warnings.
But if I read the code correctly, the PKGBUILD needs both md5 and sha256 for not displaying any warnings.
So users would still get warnings with all the old PKGBUILDs who only have md5.

If we don't want to confuse users with warnings, I see two ways out :
1) implement Aaron's way : don't display any warnings as soon as we have at least one required checksum. And switch to INTEGRITY_CHECK=(md5 sha256) as default
2) getting rid of that warning completely.
Comment by Aaron Bull Schaefer (elasticdog) - Tuesday, 13 January 2009, 18:47 GMT
There is NO need to update the entire repo...if you make the change in the configuration file, then you can just wait until people update their PKGBUILDs naturally. As Dan mentioned in the ML, a patch has already been applied that makes it so INTEGRITY_CHECK is used for generation ONLY, and will no longer effect the verification stage (http://snurl.com/9ykk0). That issue is what spawned this entire conversation.

What that means is that users won't get a warning on old PKGBUILD files, and you won't have to spend the time generating multiple checksums or updating everything at the same time. Once pacman v3.0 is released, makepkg will verify any checksums that are in the PKGBUILD regardless of what INTEGRITY_CHECK is set to. It will only fail if there are no checksums, or one of the provided checksums fail. There will be no "expected warnings".
Comment by Aaron Griffin (phrakture) - Tuesday, 13 January 2009, 18:49 GMT
Well then, that covers my only concern. Carry on
Comment by Dan McGee (toofishes) - Tuesday, 13 January 2009, 19:35 GMT
Please read the patch first, Aaron S.- you have stated the wrong conclusions:
http://projects.archlinux.org/?p=pacman.git;a=commitdiff;h=496b687c3d4a56641051700e7c4e5e21b9c6607c;hp=baf58525553db8f1e680de16793b147c88df59e2

Xavier is 100% correct here. If you apply the patch from the first comment, you will get a warning for every single PKGBUILD not containing sha256sums.
Comment by Aaron Bull Schaefer (elasticdog) - Tuesday, 13 January 2009, 19:55 GMT
Indeed, I was wrong about that patch...although that is a separate issue than the whole SHA-256 versus MD5 debate. That gets back to my original complaint about the functionality of makepkg and how INTEGRITY_CHECK is used. There is no point in warning on verification based on what is set in the makepkg.conf. There is no added benefit, if you assume that the hashes we are using are secure.
Comment by Aaron Bull Schaefer (elasticdog) - Tuesday, 13 January 2009, 23:15 GMT
After my first email describing the problem, I assumed that Dan saying he had "addressed this upstream in an earlier patch, so this will be fixed in pacman 3.3.0" meant that it was the same issue...I hadn't looked closely enough at the fix to see what it did. My apologies.

Here is a patch that updates makepkg to only use INTEGRITY_CHECK on generation of checksums and no longer uses it on verification. It will error out if there are no checksums present (or if a checksum fails). It will also error out if the given checksum list is incomplete. If there are multiple checksum lists given, one incomplete and the other complete, it will give a warning about the incomplete list and continue building the package regardless (assuming the completed list verifies all files). If there are multiple complete lists, it will check them all, as expected.
Comment by Dan McGee (toofishes) - Wednesday, 14 January 2009, 01:51 GMT
The patch is fine and I'll apply it, with a few updates:

1) Not sure if you munged the blank line between the patch title and the "Signed-off-by", but that messes up git-am.
2) If an array is wrong in any way, it makes more sense to fail. Incomplete arrays are just as incorrect as incorrect checksums- I'm not sure why we would want to ever permit this. I've changed it to do this.
Comment by Aaron Bull Schaefer (elasticdog) - Wednesday, 14 January 2009, 02:11 GMT
Not sure what happened with the blank line, I just added a commit message as usual with the -s flag. Was the problem with the blank line after the "Signed off by..." and before the "---"?

Looking at the modified patch, getting rid of my elif means that it will error out when no checksums for a particular type are provided. So even if there are valid md5sums, it will go to the next type and give the error: "Integrity checks (sha1) differ in size from the source array." because there are none provided. That's why I tested to make sure [ ${#integrity_sums[@]} -gt 0 ].

Aside from that though, I'm fine with erroring out in the case of one incomplete array and one complete array. It does make more sense to ensure that all checksums are present and up to date.
Comment by Aaron Bull Schaefer (elasticdog) - Wednesday, 14 January 2009, 02:41 GMT
Saw that you pushed the fix...thanks! So will the other patch be applied as well? As long as the makepkg patch is in place, the INTEGRITY_CHECK config change won't cause any issues for users...for sure this time :-)
Comment by Dan McGee (toofishes) - Wednesday, 14 January 2009, 03:02 GMT
1) Pushing this patch will only make it the default for the makepkg.conf shipped with pacman. I'm sure this will come up on the Arch developers mailing list as well. This is not to say I have any sort of problem with it, however.
2) As Aaron stated, there are two cases. In my mind, case 1 is the *much* easier avenue of attack, as we are trusting that we got a valid source tarball. Going along with this line of thinking, I've yet to see upstream developers giving out the sha256sums for their software. You are much more likely to see sha1sums, which we could directly verify with makepkg. To me, it seems like this is the better precedent to be following- use upstream provided sha1sums that are hopefully stored in a different place than the source itself, and actually do source verification instead of just a makepkg -g >> PKGBUILD.
Comment by Aaron Bull Schaefer (elasticdog) - Wednesday, 14 January 2009, 03:35 GMT
Yes, ideally it would be nice to get checksums provided directly by the upstream developers, and even better, sufficiently secure checksums that are provided through a different channel than the source itself, but I think that's wishful thinking. We can't control upstream developers do, nor what type of checksums (if any) they provide with their software, but we can control what checksums we use when the inevitable "makepkg -g >>PKGBUILD" does occur.

I agree that using upstream-provided checksums is the best practice (no matter what hash function they provide), but keep in mind that SHA-1 is broken too, just not as severely as MD5...yet. I'd personally use a provided MD5 or SHA-1 for initial verification, but after testing the package, generate a more secure hash to help protect our users. It wouldn't necessarily prevent the first malicious scenario, but it would prevent the second; and hopefully they're both rare enough (and our testing thorough enough) that it won't ever be an issue. My whole point is covering our own ass with the code we actually control by picking sane defaults.
Comment by Xavier (shining) - Monday, 25 May 2009, 11:40 GMT
Dan said : "I'm sure this will come up on the Arch developers mailing list"

Did it happen?
Comment by Allan McRae (Allan) - Monday, 25 May 2009, 11:52 GMT
Nope. But I am going to bring up the possibility of distributing our own makepkg.conf in the Arch package so we can easily change the from the default values provided in the pacman source (e.g. adding LDFLAGS). In fact, any distribution using pacman should probably do this... That will provide an opportunity for this discussion.
Comment by Dan McGee (toofishes) - Monday, 25 May 2009, 21:24 GMT
Yeah, we should definitely distribute our own makepkg.conf just as we do with pacman.conf.
Comment by Tomas Mudrunka (harvie) - Wednesday, 10 June 2009, 22:59 GMT
We can also sign all packages in repository by GPG certificate (public certificate will be on installation discs and on official archlinux website).

BTW i made small benchmark of common hashing alghoritms that i found in my Arch Linux:
for i in md5sum sha1sum sha224sum sha256sum sha384sum sha512sum shasum; do echo $i:; time $i 698MB_big_file.avi; echo; done;


md5sum:
real 0m4.609s
user 0m2.530s
sys 0m0.593s

sha1sum:
real 0m14.013s
user 0m8.626s
sys 0m0.633s

sha224sum:
real 0m13.595s
user 0m8.936s
sys 0m0.727s

sha256sum:
real 0m13.609s
user 0m8.949s
sys 0m0.683s

sha384sum:
real 0m49.359s
user 0m30.675s
sys 0m1.017s

sha512sum:
real 0m47.823s
user 0m30.978s
sys 0m1.007s

shasum:
real 0m9.819s
user 0m6.276s
sys 0m0.617s
Comment by Allan McRae (Allan) - Friday, 07 August 2009, 06:35 GMT
After discussion of this on the arch-dev-public list, this is a "Won't Implement" for Arch.

Loading...