AUR web interface

Tasklist

FS#4276 - PKGBUILDs open to spamharvest

Attached to Project: AUR web interface
Opened by Hugo (Citral) - Sunday, 26 March 2006, 10:37 GMT
Last edited by Loui Chang (louipc) - Sunday, 22 June 2008, 17:20 GMT
Task Type Feature Request
Category Backend
Status Closed
Assigned To Paul Mattal (paul)
Architecture All
Severity Low
Priority Normal
Reported Version 1.2.7
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 2
Private No

Details

AUR allows open access from the internet to PKGBUILDS. This is a nice feature, but also makes it easy for spammers to harvest email addresses from the "Mantainer: " and "Contributor: " lines.

Take this google link: http://www.google.nl/search?hl=nl&q=site%3Aaur.archlinux.org+PKGBUILD&meta=
Google has more than 1800 hits for PKGBUILDS. This is easy food for spambots.

It would be nice if AUR could obfuscate email addresses in PKGBUILDs when it gives webaccess to them. I don't think it is a problem, because there are more than sufficient ways to contact the package owner via AUR. And if the PKGBUILD is in a tarball, then nothing would need to be done either.
This task depends upon

Closed by  Loui Chang (louipc)
Sunday, 22 June 2008, 17:20 GMT
Reason for closing:  Won't implement
Comment by Hugo (Citral) - Sunday, 30 April 2006, 19:18 GMT
Ok, why doesn't any dev say a thing? I think it is a pretty important bug that is easy to solve...
Comment by Paul Mattal (paul) - Tuesday, 30 May 2006, 11:55 GMT
From Hugo:

Please give it another thought. I hope you consider these points: 1. Not all PKGBUILDS that have my e-mail adres are under my control; 2. I'd have to spam AUR with 20 new PKGBUILDS; 3. Users don't obfuscate their email adres in a PKGBUILD because they falsely assume it to be hidden in a tarball; 4. It's only a matter of adding a regexpr to the code.
Comment by Paul Mattal (paul) - Tuesday, 30 May 2006, 11:57 GMT
So here's the thing: I'm not sure it is a matter of just adding a regexp to the code; the PKGBUILD file gets served up directly from the directory in which it lives; so we'd have to add a script to filter it.

Still, point is taken -- that's not a huge amount of work.

I don't know, actually, how I missed this item before; sorry nobody addressed it. Let's leave it open a little longer and think about the viability of doing something about it.
Comment by Simo Leone (neotuli) - Saturday, 17 June 2006, 07:20 GMT
I still say if a user doesn't want their email open to that, obfuscate it or do not put it int he PKGBUILD. It's silly and error-prone to start using regexps on our PKGBUILDS.
Comment by Hugo (Citral) - Saturday, 17 June 2006, 15:57 GMT
You're simply repeating what you said earlier. Let me repeat my points then which you haven't addressed yet:
1. Not all PKGBUILDS that have my e-mail adres are still under my control;
2. I'd have to spam AUR with 20 new PKGBUILDS;
3. Users don't obfuscate their email adres in a PKGBUILD because they falsely assume it to be hidden in a tarball;
4. Implementation is easy.

Also, I don't understand what is silly or error prone about using a regexp. Note that the PKGBUILDS as stored on AUR stay the same, just the webacces, which spambots use, gets a regexp. It can't get much cleaner than that.

Comment by Simo Leone (neotuli) - Saturday, 17 June 2006, 18:00 GMT
1> Ask the new maintainers to remove/obfuscate your email address, and if they don't want to do it and it's really that much of an issue, give me a list and I'll personally take care of it
2> I can change them myself if you'd really like me to, otherwise, sorry about the lack of options. Though you did upload them in the first place..
3> You underestimate users, I don't find it particularly hard to click "show files" or "pkgbuild" and realize that there is direct access
4> Easy and sane are two different things. Yes it's easy in some ways, however, we don't serve the raw PKGBUILDs nor the files list through the AUR's code, we just use apache's own indexing or direct access. That makes things a bit more interesting already, because a lot more code changes would be needed to change that. Also, what is error-prone about it is a certain regexp syntax which might be used in the PKGBUILDs, just as an example.

sed -i 's@blah.sh@foo.sh@g' Makefile

Parts of that would easily regexp as e-mail addresses, so now we're talking about changing functional content of PKGBUILDs. That's not too cool with me, even if it doesn't get done in the tarball, because many TUs don't download the tarball when they go to mark things safe, but rather just look at the PKGBUILD, and funny looking regexps are one of the things they're interested in reading. Obfuscating automatically makes this impossible.

A more specific regexp, such as including the "Submitter:" portion, or assuming that people enclose their address in brackets "<foo@bar.com>", aren't going to cover all the variations on these comments found in the AUR, which leaves quite a few people without the protection of obfuscation, which means they would have to change their PKGBUILDs to match, or make sure they upload it in that specific format in the first place, which seems to bring me full circle...
Comment by Hugo (Citral) - Monday, 19 June 2006, 12:53 GMT
1> Thanks for the offer, but if we could find a solution here, we could spare you the cumbersome work...
2> True, but the issue was what I'd have to spam AUR with 20+ PKGBUILDS, which is an annoyance for other people.
3> New users have to find that out first, and what about the packages which were uploaded before this web access was in place?
4> You got a valid point there about it being difficult to incorporate a script into an apache opendir. But isn't it possible to, using, .htaccess, have $path$/PKGBUILD point to some handling script? Piping a file to output and applying a regexp is trivial, and heck, you could even apply stuff like color syntaxing.

About the regexp: I think you can cover 99.9% of the cases by this perl regexp, and it doesn't change any functionality of the PKGBUILD:
s/^(#.*)(.+)@(.+)\.(.+)(\s+.*)?$/$1 $2_AT_$3_DOT_$4 $5/g

Comment by Paul Mattal (paul) - Tuesday, 20 June 2006, 13:09 GMT
Here's a concept: what if we did not provide the link to PKGBUILD unless you were logged in? That would be a fairly simple fix that would accomplish something useful, I'd think.
Comment by Hugo (Citral) - Tuesday, 20 June 2006, 18:05 GMT
Sounds like a sweet idea to me!
However, would it also be possible to make the PKGBUILDS inaccessible (ie require some cookie) unless you are logged in? Or else have google not spider the PKGBUILDS? I do think that spambots use google intensively.
Comment by Paul Mattal (paul) - Tuesday, 20 June 2006, 19:10 GMT
It should not be difficult to put a thin wrapper around the PKGBUILDs to require a cookie. Also, the spambot thing brings to mind that we should probably have a robots.txt that excludes indexing of the AUR, other than perhaps the homepage. I think we can also then ask Google to purge pages in aur.archlinux.org from their indexes and/or reindex the site.
Comment by Hugo (Citral) - Monday, 13 November 2006, 20:41 GMT
You made some great suggestions back in June. Can you reconsider implenting them?

Regards,

Hugo
Comment by pete (drg006) - Tuesday, 22 May 2007, 19:47 GMT
Email address are also exposed in the web application itself on the user info pages (i.e., http://aur.archlinux.org/account.php?Action=AccountInfo&ID=XXX). These addresses could be obfuscated with some simple javascript.
Comment by pete (drg006) - Tuesday, 22 May 2007, 19:49 GMT
Here's an example:
<script>
document.write('<a href="mailto:name');
document.write('@company.com');
document.write('">contact me</a>');
</script>
Comment by Gavin Bisesi (Daenyth) - Thursday, 19 June 2008, 23:17 GMT
I think only matching Submitter:, Maintainer:, and Contributor: lines makes sense, as that is the packaging standard. We shouldn't concern ourselves with the other PKGBUILDS.
Comment by Loui Chang (louipc) - Sunday, 22 June 2008, 17:11 GMT
I don't think modifying the PKGBUILDs in any way makes sense.

Loading...