FS#11091 - Add package verification capability to pacman

Attached to Project: Pacman
Opened by none given (hoban) - Friday, 01 August 2008, 20:49 GMT
Last edited by Allan McRae (Allan) - Friday, 14 December 2012, 03:58 GMT
Task Type Feature Request
Category General
Status Closed
Assigned To Dan McGee (toofishes)
Allan McRae (Allan)
Architecture All
Severity Medium
Priority Normal
Reported Version None
Due in Version 4.1.0
Due Date Undecided
Percent Complete 100%
Votes 17
Private No

Details

Description:
Please add the capabilty to verify packages that have been installed by pacman.

Coming from Red Hat Linux, I've been very impressed with Arch Linux. One of the few things that I've found missing from pacman is the ability to verify if and how files have changed since they were installed by the package management software.

From the rpm(8):
"Verify Options

The general form of an rpm verify command is

rpm {-V|--verify} [select-options] [verify-options]

Verifying a package compares information about the installed files in the package with information about the files taken from the package metadata stored in the rpm database. Among other things, verifying compares the size, MD5 sum, permissions, type, owner and group of each file. Any discrepancies are displayed.
--snip--"

As an example, let's say I hop on an Arch machine that has been fiddled with and my task is to fix it. Let's assume that my LVM volumes were previously being mounted at boot time, and that is no longer the case. As a troubleshooting step, I run "pacman -V initscripts" and I'm given information that looks something like this:
S.5....T /etc/rc.sysinit
(note that the above output is pulled directly from an rpm -V)
From this output, I would know that the Size, md5 sum, and Timestamps of the file /etc/rc.sysinit do not match what has been stored in the pacman database at install time. After further investigation, I realize that I (or someone) had commented out the portion of rc.sysinit responsible for detecting and mounting LVM volumes.

This is just one example of the usefulness of this type of functionality.
Note that using this sort of verification process as a security mechanism would not be recommended as the database is read-write and could be edited if an intruder happened to get in. <--if this type of functionality is desired, perhaps a package like "tripwire" would be better up to the task.

Thanks in advance for your work!
This task depends upon

Closed by  Allan McRae (Allan)
Friday, 14 December 2012, 03:58 GMT
Reason for closing:  Fixed
Additional comments about closing:  Lots of commits around 01e093d0. Adding checksum checking requires libarchive support.
Comment by Nagy Gabor (combo) - Friday, 15 August 2008, 11:58 GMT
If I understood correctly, this rpm command compares database and filesystem (not rpm-file and filesystem). The problem is, that we don't store anything about files of package in database (we store md5sums of "config" files only).

Atm I don't think we will implement this soon, sorry (Dan is the pacman leader, so my opinion is not relevant here ;-).
Comment by Gavin Bisesi (Daenyth) - Friday, 15 August 2008, 13:40 GMT
You're correct in that this is how it currently works. I don't really see the downside to taking the file information and storing it though. I suppose it would make the local DB larger, but is that relevent at all? I think it would be simple to hack up some shell scripts that could work with a .pkg.tar.gz file to gather the file info, but that would be expensive for checking every file, and also would become problematic when the installed version is older than the repo version, and the package is not kept in the cache.

Anyone else have an opinion on this? I'm in favor as long as it doesn't degrade DB performance.
Comment by none given (hoban) - Friday, 15 August 2008, 13:41 GMT
Your understanding is correct; some sort of database would need to exist, cataloging md5sums, mode, ownership, etc. Should I suppose that this type of functionality will not make it into PacMan and instead focus energies developing a similar system outside of the package manager?
Comment by Gavin Bisesi (Daenyth) - Friday, 15 August 2008, 13:45 GMT
No one from the pacman dev team has spoken up yet so I wouldn't say that it's not gonna get added. The best way to get it added would be to write the patch yourself and submit it for review to the pacman-dev mailing list. You could also try #archlinux-pacman on irc.

I think having separate scripts with a separate DB could work, but in this case you're probably better off using something like tripwire. I also think it would be very easy for the scripts to become out of sync with pacman.
Comment by none given (hoban) - Friday, 15 August 2008, 14:03 GMT
I had considered the fact that I'd likely be turning to tripwire. The reason I created this ticket was because I personally know, and have also read many people claim that advanced features such as package verification (this ticket) and package querying (already a feature of pacman) are the only reasons that they stick with rpm-based distributions. My personal feeling is that if Arch ever wants a chance at getting used widely on servers (I have no idea if this is a goal of the Arch community), features such as verification as covered by this ticket, as well as gpg/pgp signing packages (covered by other tickets starting back in 2004), must be implemented and automated a bit more. It's one thing to create a wiki page covering tripwire and pointing people there, and it's quite another having the functionality built-in to the OS from the get-go. Trivial or out-of-box support (sure, turn it off by default or make it a plugin if you don't want the bloat) for these types of features are, again in _my_ opinion, essential for enterprise use.
I realize I've opened a can of worms here, if we really wanted to push for enterprise use, we'd also have to start a security team (if one doesn't already exist...I'm sorry, I'm new to the community) whose responsibility it would be to audit the distro, as well as provide security patches for package rather than just pulling down the latest version from the third-party releaser.
With these things in mind, I'm beginning to think that I ought to be thinking in terms of troubleshooting, as I was when I created the ticket, rather than in terms of enterprise use.
Thanks!
Comment by Dan McGee (toofishes) - Saturday, 06 June 2009, 17:53 GMT
Are  FS#11091  and  FS#13877  dupes?
Comment by none given (hoban) - Sunday, 07 June 2009, 03:00 GMT
Well, this one _is_ 11091 and the creator of 13877 already commented -- this is not a dupe.
Comment by none given (hoban) - Sunday, 07 June 2009, 03:01 GMT
thanks. ;)
Comment by Allan McRae (Allan) - Saturday, 10 October 2009, 08:41 GMT
Could this be implemented by comparing files to those in that package from the cache/repo? That would mean that md5sums do not need to be recorded in the database.
Comment by solsTiCe (zebul666) - Sunday, 18 October 2009, 09:46 GMT
here is a basic python script that just does that
warning: pacman db location is hardcoded in script
Comment by solsTiCe (zebul666) - Wednesday, 04 November 2009, 19:56 GMT
a new version that can take a package name as argument. it will look for the archive tarball in the cache (default location)
if no argument given, it will look for argument on stdin. so it can be used with pipe like

tail -n 20 /var/log/pacman.log|grep upgraded |cut -f 4 -d ' '|python verifypkg.py
Comment by Dieter Plaetinck (Dieter_be) - Thursday, 03 March 2011, 16:12 GMT
any updates on this?
@Allan: looking at the checksums from a package in the repo doesn't seem like a good idea. The package could have been updated on the mirrors after you installed it.
And a cache should be just that, a cache. Storing checksums in the database seems wisest to me.

Ps: I could have sworn Xyne did an integrity checker, but I can't find it back. so maybe i'm wrong.
Btw Soltice your script seems like a nice start, though it doesn't work for me as I don't have the package I'm interested in in my cache.
Comment by Dan McGee (toofishes) - Thursday, 03 March 2011, 16:21 GMT
If packages change, your distro is doing it wrong. Packages should never ever change once built, version numbers should always get bumped.
Comment by Dieter Plaetinck (Dieter_be) - Thursday, 03 March 2011, 16:34 GMT
Yes Dan, so when there's a package update the repository only has the new package (with bumped version number) which contains only the checksums for the files in the new version, which are not necessarily the same as the checksums for the files included in the older package the user installed.
Comment by Dan McGee (toofishes) - Thursday, 03 March 2011, 16:56 GMT
Well of course we wouldn't count that as a matching package- I think that was my confusion. Checking installed contents against cached packages seems just fine to me; if you don't have that package in your cache then we just won't be able to verify and test- to me this seems like a pretty reasonable tradeoff and one that can be accomplished with not all that much work. Otherwise we need to capture a ton more info than we already have, and I'm not sure if even that would be enough- permissions, owner, checksum, filesizes, etc.
Comment by none given (hoban) - Thursday, 03 March 2011, 17:08 GMT
Am I the only one who thinks that relying on pulling the information from the network is silly? It makes a lot more sense in my mind to have a local database containing the metadata. Not only does doing so save the arch mirrros traffic, but frankly a local database will be faster and will contain more accurate data; a mirror won't contain metadata for packages built from AUR or customized packages built using ABS for example.
Comment by Dieter Plaetinck (Dieter_be) - Thursday, 03 March 2011, 17:41 GMT
@hoban: no you're not. But you are the only one who doesn't bother reading a few comments up.

@Dan: it might be interesting to calculate how much the local database would grow to support this feature.
OTOH: since currently the cache is just a cache, I feel safe removing it. If users need to keep the cached packagefiles of all installed packages because at some point they might want to do an integrity check.. that's an overhead which bothers me a lot more.
Comment by Dan McGee (toofishes) - Thursday, 03 March 2011, 17:42 GMT
Bagh- it isn't straightfoward to go from a local package to a file in the cache since we don't persist the the filename it was installed from.

If we implement this, our local database is going to get significantly bigger and the format will definitely be changing significantly to accommodate more metadata. This would definitely be a nice step but we'd want to make sure we think it through and design it right so this info is easily accessible and possible extensible in the future.

@none: I have no idea where you got the idea we'd be going to the network...
Comment by Allan McRae (Allan) - Thursday, 03 March 2011, 22:44 GMT
Here is another script that does to comparison to things in the cache:
http://mailman.archlinux.org/pipermail/pacman-dev/2010-November/011940.html
Comment by Dan McGee (toofishes) - Saturday, 16 July 2011, 01:59 GMT
* libalpm in git now has filelists with more attributes than before, including mode and size
* libarchive supports the mtree(5) format, which is an established way of tracking this stuff by the BSDs
* If we are going to make changes to the filelist format in the local DB, I'd love to use an existing format
Comment by Allan McRae (Allan) - Monday, 20 February 2012, 13:00 GMT
I had a look at this today. It is very simple to create a (compressed) mtree file with all the package file information on package installation. I'd suggest that this stays as a separate file to the "files" file.

Then all that needs decided is what to check. I propose:

Directory: uid, gid, mode
File: uid, gid, mode, size, md5, time
Symbolic Link: uid, gid, mode, link, time

See "man mtree" for what those fields are.
Comment by solsTiCe (zebul666) - Tuesday, 21 February 2012, 11:02 GMT
What about post-install/post-update hook that could run any script like one to run a tripwire update, or any such software like aide, rkhunter --propupd or whatever ?

I think it would be the weakest link in term of security, but why not ?

It was an idea to stick to KISS with pacman. but the idea of allan don't need a lot of code, I guess, so...
Comment by Allan McRae (Allan) - Tuesday, 21 February 2012, 11:10 GMT
That seems entirely unrelated to this bug... There are other proposals about adding hooks into pacman.
Comment by Leonid Isaev (lisaev) - Thursday, 05 July 2012, 18:15 GMT
Hmm, this is an educational discussion -- apparently I have been reinventing the wheel...

Anyway, I don't quite agree with the statement that this "pacman -V" feature can't be a security measure. Having in mind pacman 4, currently I see two fundamental barriers for this to become a valid security layer (already mentioned in the original report):
1. pacman keyring /etc/pacman/gnupg can be freely tampered with;
2. pacman cache /var/cache/pacman can be tampered with (so there is no way to determine if cache is genuine).

Problem (1) can be solved by deploying archlinux-keyring in a clean environment (e.g. LiveCD) and is outside of pacman's control. However, problem (2) can be dealt with quite easily -- all what's required is storing *.sig files in /var/cache/pacman/pkg alongside with packages. This would add an overhead ~1KiB/package. Is there an option for this already?
Comment by Allan McRae (Allan) - Friday, 14 December 2012, 03:57 GMT
The package files verification is all implemented. However, we can only check file attributes (size, time, permissions etc). We need libarchive to support to check the md5/sha256 sums of the files.
Comment by Radek Podgorny (rpodgorny) - Wednesday, 11 December 2013, 18:30 GMT
...so is there any follow-up bug (for the libarchive support)?

Loading...