AUR web interface

Tasklist

FS#25742 - Add ETag support for package pages

Attached to Project: AUR web interface
Opened by kachelaqa (kachelaqa) - Wednesday, 24 August 2011, 12:20 GMT
Last edited by Lukas Fleischer (lfleischer) - Sunday, 14 June 2015, 17:02 GMT
Task Type Feature Request
Category Backend
Status Closed
Assigned To Lukas Fleischer (lfleischer)
Architecture All
Severity Medium
Priority Normal
Reported Version 1.9.0
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

ETags are currently added for rpc requests and automatically generated for static files like the PKGBUILD and tarball. Would it be possible to add them for package pages?

According to this post[1], the total number of package page requests is "a distant second" to rpc info requests. However, there is much more potential for saving bandwidth by caching package pages as the average content length is much greater than for rpc output.

I think that a lot of AUR helper programs will still be scraping package pages as it remains the only/easiest way to obtain certain information (e.g. the comments, and the dependency and required-by lists).

[1]aur-dev@archlinux.org/msg01323.html"> http://www.mail-archive.com/aur-dev@archlinux.org/msg01323.html
This task depends upon

Closed by  Lukas Fleischer (lfleischer)
Sunday, 14 June 2015, 17:02 GMT
Reason for closing:  Deferred
Comment by Lukas Fleischer (lfleischer) - Wednesday, 24 August 2011, 12:44 GMT
The answer here is that no AUR helper should access the package details pages at all. None. Actually, any AUR helper shouldn't access *anything*, except for the RPC interface and the source tarball referenced in RPC responses. Everything else is just unsupported and broken by design. Dependencies should be resolved by extracting and parsing the PKGBUILD from the source tarball. The AUR doesn't provide a full bash parser, so dependency listings might be broken if parameter substition is used.

In regard to comments and required-by list, I don't even know why an AUR helper should display/parse them in any way. In my understanding, AUR helpers are designed to be a counterpart to pacman(8) for the AUR. pacman(8) doesn't check the BBS for comments on packages in our repos, also, and, afaik, it has no option to scan all repositories for packages that depend on some given package. Doing these things wouldn't be KISS, wouldn't be the Arch way.
Comment by kachelaqa (kachelaqa) - Wednesday, 24 August 2011, 13:10 GMT
I believe AUR helpers scrape the package page because that is the only way to obtain certain information. If there was a better way (like the rpc API), I'm sure they would do.

As for comments, these are much more relevant for AUR packages than for official packages because they often contain useful information that can help solve problems with the build process. The yaourt AUR program displays comments before prompting the user to view/edit the PKGBUILD, for instance.

With regard to pacman and required-by: the relevant libalpm function is alpm_pkg_compute_requiredby. This is used by pacman -Qt and pactree -r.

But I have to say I find your overall response a little hard to understand. If the information displayed on the package page is so irrelevant, why go to the trouble of showing it to users all?
Comment by Lukas Fleischer (lfleischer) - Wednesday, 24 August 2011, 16:14 GMT
Build issues should be fixed in the PKGBUILD itself, not via patches in comments. If a build fails, AUR helpers could have an option to show a link to the package details page and users can check comments themselves using a browser, which should be fine given that a build failure should be an exception, not the rule. pacman(8) also doesn't show any bug reports before installing a package. If installing a package from the binary repos fails, open your browser and check the bug tracker. If building or installing an AUR package fails, open your browser and check the package comments. I don't see any real difference here.

For local packages, alpm_pkg_compute_requiredby() does only search the local database (that is what you see when running `pacman -Qi` or `pactree -r`). pacman(8) will handle AUR packages as well in this case. The only thing that is obviously hard to do is the counterpart of `pacman -Sii` - search in sync databases, as there isn't any sync database of the AUR. The issue here is that it is impossible to create a proper database of all AUR packages, since there isn't any way to detect all packages that a binary package (built from an AUR source tarball) will depend on (dependencies might be architecture-specific etc.). This is another reason for the AUR not showing reliable required-by lists on the package details page, and another reason for AUR helpers not to depend on them.

Think of the AUR web interface as the default frontend to the package database of the AUR. Just like any other frontend (which might be an AUR helper or $whatever), it extracts the PKGBUILD and parses several fields (package name, package version, dependencies, ...) The only difference to a good AUR helper is that it doesn't do a very good job at parsing PKGBUILDs - mainly due to security reasons. Other frontends should not re-parse the output of our official frontend but use the raw data from the backend and parse stuff themselves.
Comment by kachelaqa (kachelaqa) - Thursday, 25 August 2011, 00:40 GMT
If some AUR helpers choose to show comments directly rather than provide a link, i suspect it's simply because users requested it as a convenience (and this does not seem totally unreasonable to me). I think your comparisons with pacman here are spurious, as users have completely different expectations when it comes to offically supported packages.

On the question of alpm_pkg_compute_requiredby: for local packages, and packages loaded from file, only the local database is searched; but for any other packages, all the available sync databases are searched. However, it really doesn't matter that the AUR can't match the capabilities of libalpm. Given it's nature, nobody should ever expect 100% reliable information from the AUR.

As for your other points: The problem of parsing pkgbuilds is irrelevant. AUR helpers only scrape the web interface for information that they can't get from pkgbuilds (or the rpc api). The real problem is that the AUR is a walled garden. Creating a genuinely alternate frontend is impossible without direct access to its database.

But, anyway, to get back on topic: if the implementation of etags is simple and there's minimal overhead, isn't adding them a generally good idea?
Comment by Lukas Fleischer (lfleischer) - Thursday, 25 August 2011, 11:24 GMT
Well, I just don't like these hacky workarounds. Things should be done more properly, even if it involves some more work. It is a known fact that the AUR is kind of broken and that there's a lot of legacy code but that shouldn't prevent anyone from trying to fix it (or discussion it officially, at least). It seems like you also forgot that the AUR itself only uses the PKGBUILD (and the existing package database) to generate those fields. Except for metadata, the required-by list should be the only field that AUR helpers cannot build on their own, using the RPC interface and the source tarball. Actually, the required-by list doesn't even make a lot of sense in the context of the AUR and source tarballs. Imagine there were only two packages "foo" and "bar" in the AUR. The i686 version of foo depends on bar, whereas the x86_64 version of bar depends on foo. How should the required-by list look like in this case? This is kind of unrealistic but we should have well-defined behaviour here.

Still not sure about ETags for the package details pages. I won't add them for the sake of better support for crappy AUR helpers for sure. Nevertheless, we could reduce bandwidth here and users browsing packages the official way might profit as well.
Comment by kachelaqa (kachelaqa) - Thursday, 25 August 2011, 12:20 GMT
I wish I hadn't mentioned AUR helpers in my feature request as it has been an unwanted and largely irrelevant distraction.

How etags might be used in practice is neither here nor there. The aim is just to exploit a relatively cheap and simple means of reducing bandwidth for all interested parties.

Loading...