Issue tracker moved to https://gitlab.archlinux.org/archlinux/aurweb/-/issues
FS#4276 - PKGBUILDs open to spamharvest
Attached to Project:
AUR web interface
Opened by Hugo (Citral) - Sunday, 26 March 2006, 10:37 GMT
Last edited by Loui Chang (louipc) - Sunday, 22 June 2008, 17:20 GMT
Opened by Hugo (Citral) - Sunday, 26 March 2006, 10:37 GMT
Last edited by Loui Chang (louipc) - Sunday, 22 June 2008, 17:20 GMT
|
DetailsAUR allows open access from the internet to PKGBUILDS. This is a nice feature, but also makes it easy for spammers to harvest email addresses from the "Mantainer: " and "Contributor: " lines.
Take this google link: http://www.google.nl/search?hl=nl&q=site%3Aaur.archlinux.org+PKGBUILD&meta= Google has more than 1800 hits for PKGBUILDS. This is easy food for spambots. It would be nice if AUR could obfuscate email addresses in PKGBUILDs when it gives webaccess to them. I don't think it is a problem, because there are more than sufficient ways to contact the package owner via AUR. And if the PKGBUILD is in a tarball, then nothing would need to be done either. |
This task depends upon
Please give it another thought. I hope you consider these points: 1. Not all PKGBUILDS that have my e-mail adres are under my control; 2. I'd have to spam AUR with 20 new PKGBUILDS; 3. Users don't obfuscate their email adres in a PKGBUILD because they falsely assume it to be hidden in a tarball; 4. It's only a matter of adding a regexpr to the code.
Still, point is taken -- that's not a huge amount of work.
I don't know, actually, how I missed this item before; sorry nobody addressed it. Let's leave it open a little longer and think about the viability of doing something about it.
1. Not all PKGBUILDS that have my e-mail adres are still under my control;
2. I'd have to spam AUR with 20 new PKGBUILDS;
3. Users don't obfuscate their email adres in a PKGBUILD because they falsely assume it to be hidden in a tarball;
4. Implementation is easy.
Also, I don't understand what is silly or error prone about using a regexp. Note that the PKGBUILDS as stored on AUR stay the same, just the webacces, which spambots use, gets a regexp. It can't get much cleaner than that.
2> I can change them myself if you'd really like me to, otherwise, sorry about the lack of options. Though you did upload them in the first place..
3> You underestimate users, I don't find it particularly hard to click "show files" or "pkgbuild" and realize that there is direct access
4> Easy and sane are two different things. Yes it's easy in some ways, however, we don't serve the raw PKGBUILDs nor the files list through the AUR's code, we just use apache's own indexing or direct access. That makes things a bit more interesting already, because a lot more code changes would be needed to change that. Also, what is error-prone about it is a certain regexp syntax which might be used in the PKGBUILDs, just as an example.
sed -i 's@blah.sh@foo.sh@g' Makefile
Parts of that would easily regexp as e-mail addresses, so now we're talking about changing functional content of PKGBUILDs. That's not too cool with me, even if it doesn't get done in the tarball, because many TUs don't download the tarball when they go to mark things safe, but rather just look at the PKGBUILD, and funny looking regexps are one of the things they're interested in reading. Obfuscating automatically makes this impossible.
A more specific regexp, such as including the "Submitter:" portion, or assuming that people enclose their address in brackets "<foo@bar.com>", aren't going to cover all the variations on these comments found in the AUR, which leaves quite a few people without the protection of obfuscation, which means they would have to change their PKGBUILDs to match, or make sure they upload it in that specific format in the first place, which seems to bring me full circle...
2> True, but the issue was what I'd have to spam AUR with 20+ PKGBUILDS, which is an annoyance for other people.
3> New users have to find that out first, and what about the packages which were uploaded before this web access was in place?
4> You got a valid point there about it being difficult to incorporate a script into an apache opendir. But isn't it possible to, using, .htaccess, have $path$/PKGBUILD point to some handling script? Piping a file to output and applying a regexp is trivial, and heck, you could even apply stuff like color syntaxing.
About the regexp: I think you can cover 99.9% of the cases by this perl regexp, and it doesn't change any functionality of the PKGBUILD:
s/^(#.*)(.+)@(.+)\.(.+)(\s+.*)?$/$1 $2_AT_$3_DOT_$4 $5/g
However, would it also be possible to make the PKGBUILDS inaccessible (ie require some cookie) unless you are logged in? Or else have google not spider the PKGBUILDS? I do think that spambots use google intensively.
Regards,
Hugo
<script>
document.write('<a href="mailto:name');
document.write('@company.com');
document.write('">contact me</a>');
</script>