FS#15043 - Need better parsing of PKGBUILDs
Attached to Project:
AUR web interface
Opened by Tomas Mudrunka (harvie) - Wednesday, 10 June 2009, 13:26 GMT
Last edited by Lukas Fleischer (lfleischer) - Friday, 03 May 2013, 08:55 GMT
Opened by Tomas Mudrunka (harvie) - Wednesday, 10 June 2009, 13:26 GMT
Last edited by Lukas Fleischer (lfleischer) - Friday, 03 May 2013, 08:55 GMT
|
Details
== 1.) It would be nice to see all common arrays from
PKGBUILD in AUR. especially optdepends should be displayed
next to the regular dependencies.
== 2.) Source array is not parsed in some cases. i don't know why... compare this: http://aur.archlinux.org/packages.php?ID=25384 (sources are listed) http://aur.archlinux.org/packages.php?ID=24117 (sources are not listed) so there is some obvious bug... BASH (makepkg) is able to parse those PKGBUILDs without problem. You can use some bash script to parse PKGBUILDs for PHP when new PKGBUILDs are uploaded. == 3.) arch and license arrays can be converted to links like http://wiki.archlinux.org/index.php/AUR-Licences#$LICENSE http://wiki.archlinux.org/index.php/AUR-Archs#$ARCH so we can have nice connection between AUR and Wiki... == 4.) add some "long description" to the PKGBUILDs that will be used to display more information (longer than that 80 letters) about packages on WEB ONLY? |
This task depends upon
This task blocks these from closing
FS#12998 - include makedepends in display
FS#16394 - Split Packages in AUR
Closed by Lukas Fleischer (lfleischer)
Friday, 03 May 2013, 08:55 GMT
Reason for closing: Won't implement
Additional comments about closing: Please use .AURINFO if the PKGBUILD parser doesn't work for you.
Friday, 03 May 2013, 08:55 GMT
Reason for closing: Won't implement
Additional comments about closing: Please use .AURINFO if the PKGBUILD parser doesn't work for you.
But there are still the other 3 subissues...
2) I'm glad it's fine now.
3) Started to check that one out too, hopefully will have some time over the weekend.
4) I don't think there should be any AUR specific field in PKGBUILDs. If the pacman maintainers implement it then AUR should definitely support it, but not before that. Not sure how others feel about this....
<a href="http://aur.archlinux.org/packages.php?O=0&K=foo&do_Search=Go">foo</a>: enables foo support
today you can't see optdepends in aur at all...
3) Why not link to the package directly insted of going to the search first. A little query on the database, and there you go.
I can go work on the issue, but is there already somebody working on it? (I'm new to this...just subscribe to the mailing list)
FS#16394I think that BASH itself is the best for parsing BASH scripts (such as PKGBUILDs), so we should use BASH code base to parse PKGBUILDs. Problem is that BASH itself is insecure, because it needs to execute the code and we cannot execute untrusted PKGBUILDs on AUR server. I've suggested to use some scripts like this:
http://aur.pastebin.com/B6x1tfV1
with a little bit modified (crippled & secured) BASH (which will not execute anything and will not access the filesystem - i guess this can be reached by some kind of small "lobotomy" commited on original BASH source).
Maybe something also going on at those projects:
- http://github.com/Daenyth/Bashful
- http://github.com/sebnow/aur2
a small, metadata text file is not "bloat"... it's... a small, highly compressible _text file_. in the AUR implementation i am working on [https://bbs.archlinux.org/viewtopic.php?id=99839], SRCINFO will be _required_, as the AUR will run as pure python, as a desktop app, or translated to javascript as a webapp. i am also looking to the future, a distibuted p2p version of AUR, so it has to work on _every_ installed machine. i need a way to get the information i need, in python; text files are a common way to perform this. i'm not even going to attempt trying to parse a pkgbuild.
if anything, SRCINFO could simply be in a common format with high language support such as YAML/etc., this would allow for easy mapping to language constructs.
in my opinion, the entire concept of "parsing bash" is an absolute non-starter, and a complete waste of time; you don't parse code (bash) unless you're a compiler. an easy meta file is the way to go, that can be read by ANY language; this is how we do things in the development world.
that said, i really like the idea of wiki integration; integrate as much as possible.
C Anthony
Imagine if pacman had to parse them. Hah. There's a reason packages contain easily parsable metadata.
Let's think of the AUR as a repo, like a pacman repo but for source.
We have done pretty well so far without good metadata, but source packages should no longer be considered second-class.
We should support the idea of including parsable metadata in source packages just as it is in binary
packages. So ultimately I think this is really a makepkg issue. Anyone who is interested in this
problem should open a ticket in the pacman bug tracker and contact their mailing list pacman-dev@archlinux.org
I have to say that i miss the times when all you needed to upload new package to AUR was single PKGBUILD. it was really KISS architecture and i don't think that we need something more complex for many packages. If we are not able to parse PKGBUILDs then there is problem with parser or with PKGBUILD syntax.
PKGBUILD spec to make it more friendly for maintaining a database
of source packages. As far as I understand, some of the more
advanced features of PKGBUILDs make it difficult or impossible
to get metadata out of them without building the actual binary
package - even when parsing with bash. The PKGBUILD was obviously
not designed with the source repo in mind. We're kind of trying
to squeeze a square box into a round hole here.
Uploading plain PKGBUILDs was really just for testing purposes
introduced sometime in the middle of the AUR's life. Don't miss
it so much. The AUR was meant to be a repository of tarballed
build scripts.
I would not say that it's KISS to upload packages in inconsistent
formats. Now we can have a little more consistency. We can still
do better there though.
There's obviously a problem with the parser. It's an incomplete
bash parser written in PHP. PKGBUILDs are fully bash scripts.
Therein the problem lies. We can't just use bash itself because
that opens a security hole. Maybe it's possible with a complicated
server setup and a fair amount of resources to do it, but I want to
keep the implementation of the AUR simple. So while it's KISS for
makepkg to allow PKGBUILDs to be bash scripts and run with it, it
makes life hard (not KISS) if you want to maintain a database of
source packages, like the AUR. So lets just throw that buzz word
in the trash.
Anyways, this is not exactly an AUR issue any more in my eyes.
I would like to help solve it though because it would make the
AUR that much better. Unfortunately I am short on free time.
i recently realized a large piece of what i want out of a "state manager"...
i think we should encourage a move in pacman to use a proven modeling language like puppet:
http://docs.puppetlabs.com/guides/language_tutorial.html
such a move would retain many expressive qualities, and allow an AST object to manipulate freely in the architecture. several things like git/svn/fetch/move/copy/link/etc. are identical between packages...
the format could include range definitions, offer a means to fetch exact numbers on demand (version HEAD for example).
the point is that bash is only going further into nowhere; i want to make the concept of AUR and load/peer sharing fundamental to the the Arch experience. we need to explore richer core data structures, and use high performance persistent storage backends.
C Anthony
I agree that a better format is the way to go.
[1] https://github.com/sebnow/pkgparse
Callan and Simo investigated it at some point and they
discovered that it didn't actually restrict anything. They could
still run arbitrary commands.
Here's a reference about restricted mode.
http://www.gnu.org/software/bash/manual/html_node/The-Restricted-Shell.html
Having a proper plain-text metadata format (such as PKGINFO, or JSON/YAML) would be infinitely better. The issue with that is that either makepkg needs to be modified to support this format, or the file needs to be generated from the PKGBUILD, which is an extra step.
1. awk(1): Can be used to execute arbitrary commands, e.g. `awk 'BEGIN { system("echo Hello world!") }'`.
2. sed(1): Can be used to overwrite arbitrary files, e.g. `echo '# eval scrapt' | sed 's/a/i/g; 1w /some/other/foobar/script/that/will/be/executed/sometime/later'`.
3. test(1) aka `[`: Can be used to check if a file exists, e.g. `[ -e /some/path/to/a/file/owned/by/some/package ]`.
4. cat(1)/head(1)/tail(1): Can be used to read arbitrary files, e.g. "pkgname=`cat /etc/passwd`", "url=`head -2 /etc/passwd | tail -1`".
Those are just some very simple samples. I can think of much more complex ones that couldn't be detected without implementing full parsers for all commands we allow in restricvted mode as well.
i solved this part in CCR like this:
function strip_comments($text) {
$pass1 = preg_replace('/(".*"|\s*#.*|\'.*\'|\(.*\))/','',$text);
if ( $pass1 != "" ) {
$pass2 = str_replace($pass1,"",$text);
$pass3 = preg_replace('/(".*"|\'.*\'|\(.*\))/','',$pass2);
$final = str_replace($pass3,"",$text);
} else {
$final = '';
}
return $final;
is a bit tricky but allow the use of ### inside pkgdesc="" , '' , =("") or =('') ,
eg: pkgdesc=("this is a c# test") #some text ---> pkgdesc=("this is a c# test")
http://paste.kde.org/83557/
http://git.overlays.gentoo.org/gitweb/?p=proj/libbash.git;a=summary
In my opinion, there are only two viable solutions:
* Execute PKGBUILDs in a sandbox.
* Provide some kind of meta data with source tarballs.
I'm looking into a way of building a super-restricted Qemu -> really-tiny-kernel -> Bash-and-some-tools-in-initrd setup (no drivers or network at all, just an emulated serial port console to communicate with the outside world.
In keeping with the KISS principle I'm thinking about faking, pretending and cheating just enough to trick makepkg into building the barest skeleton of a package, and extracting the .PKGINFO file out of that.
Any help or suggestions on this would be greatly appreciated.
Currently, the AUR parses PKGBUILDS minimally, which causes issues for
split packages. There are three paths I can see the AUR taking:
1) Keep doing what it's doing and don't worry about split PKGBUILDs,
but remain safe if there is malicious code in the PKGBUILD.
2) Parse PKGBUILDS more fully but leave the AUR more open to attack
from malicious code
3) Stop parsing PKGBUILDs on the server and use something akin to
the .PKGINFO file in a package file. Have makepkg generate a
.SRCINFO file (as per Dave Reisner's suggestion) and put that in
the src.* file that is uploaded to the server.
The third option seems to be the best option to me, and would help keep
the AUR server safe. Only one thing that I found when trying to patch
makepkg to generate .SRCINFO files for split packages was that the
variables (such as pkgdesc, provides, depends, etc.) are parsed in the
package functions for the specific packages. I was thinking that the
--source flag should generate a .SRCINFO file in two ways depending on
the PKGBUILD:
split package: make sure that at least the package function is run
and generate $pkgname-$full_version$SRCEXT, instead of using the
$pkgbase. Because the .SRCINFO will contain the pkgbase, these can
be grouped together easily. (Or have a single src package with
.SRCINFO_$pkgname all in one file, then split into separate pages on
the AUR with a single location, like the official package
repositories.)
non-split package: allow --source to be run without packaging each
package, because all variables should be able to be written to the
.SRCINFO file easily
Current packages could remain, and the normal way of submitting packages
would be deprecated as pacman 4.1 is released to [core].
While current pacman development builds could still be submitted, as the
same files would be in the src tarball, but would simply include an
extra file.
Thank you for taking the time to read. If anyone wants me to keep
working on this patch or if someone wants to do this way before my
minimal shell scripting skills fail me horribly.
This is for a splitted package, with "pkgname=('pkg1' 'pkg2')" in the PKGBUILD and "pkgname = pkg1" in the .AURINFO file.
It seems there is a check done before the PKGBUILD value is overriden. But maybe this is belonging to
FS#16394instead?This won't happen with automatically created .SRCINFO file by makepkg. Thanks, and sorry for the noise.
.AURINFO:
pkgname = etlegacy
PKGBUILD:
pkgname=('etlegacy' 'etlegacy-mod')
pkgdesc="Fully compatible client and server for the popular online FPS game Wolfenstein: Enemy Territory
...
source=($pkgname-linux-$pkgver.tar.gz)
The AUR seems to consider $pkgname as being the second element of the pkgname array (etlegacy-mod), despite being overridden in the AURINFO .file.
So the Sources section of the AUR page shows incorrect filename. Have a look at the AUR page (https://aur.archlinux.org/packages/etlegacy/) and the corresponding PKGBUILD.
Makepkg correctly considers $pkgname as being the first element of the array when compiling the package locally.
When a (pacman-4.1 compliant) git-pkg has a fragment in it's source url, the source is not listen in the webui:
Without fragment:
https://aur.archlinux.org/packages/gnuradio-git/
With fragment:
https://aur.archlinux.org/packages/libosmocore/