FS#5355 - Source code availability for full GPL compliance

Attached to Project: Arch Linux
Opened by Tom Killian (tomk) - Tuesday, 05 September 2006, 13:30 GMT
Last edited by Eric Belanger (Snowman) - Wednesday, 22 July 2009, 23:53 GMT
Task Type Feature Request
Category System
Status Closed
Assigned To Eric Belanger (Snowman)
Aaron Griffin (phrakture)
Architecture All
Severity Low
Priority Normal
Reported Version 0.7.2 Gimmick
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 2
Private No

Details

Currently being discussed here:

http://bbs.archlinux.org/viewtopic.php?t=24826

in the aftermath of Mepis' related problems.
This task depends upon

Closed by  Eric Belanger (Snowman)
Wednesday, 22 July 2009, 23:53 GMT
Reason for closing:  Implemented
Comment by Gavin Bisesi (Daenyth) - Thursday, 17 April 2008, 19:11 GMT
Is this bug resolved? Seems like phrackture was saying we had a solution in the thread.
Comment by Greg (dolby) - Sunday, 04 May 2008, 15:10 GMT
For this one to be solved once and for all there should be something like this http://ftp.ntua.gr/pub/linux/slackware/slackware/source/a/bash/ which is a rather complicated task for a rolling release distribution like Archlinux. But PKGBUILDs dont work the wame way SlackBuilds do, as they contain a direct link to the source from the applications website.
Comment by Allan McRae (Allan) - Monday, 19 May 2008, 12:45 GMT
This has actually come up on the front page of Planete Beranger which is fairly well read.
http://beranger.org/index.php?page=diary&2008/05/19/10/09/10-gratuitous-assertions-in-the-lat

Not good "advertising" but that site has some strong opinions about Arch/pacman anyway...
Comment by Bryan Ischo (bji) - Tuesday, 20 May 2008, 21:28 GMT
Here are phrakture's comments from the thread:

"As per section 3 of the GPL license, one copying a GPL licensed program may either: redistribute the source, provide the source upon request, or have express permission from the copyright owners to redistribute it without the source.

I will gladly back any request for source code. Feel free to email me directly and i will mail you a CD, for the cost of shipping and the CD itself.

Problem solved."

Is phrakture likely to be able to keep good on his/her promise here? Does he/she keep a personal snapshot of all GPL'd software distributed on all Arch Linux releases going back 3 years, including all versions that were built into packages distributed via AUR? If he/she does, and really can deliver a source CD based on the specific versions of GPL'd software requested, then this promise is sound. If not (and I suspect this is the case), then this promise is not good enough, because it is quite possible that it will be impossible or impractical for phrakture to recover the source "after the fact". What if I request the source for the unique and specific version of 300 individual pieces of GPL software as distributed by Arch Linux? Will phrakture be able to hunt down and obtain the individual pieces in a timely manner (if at all)? Will he/she even be willing to when actually requested to do so? I think that his/her promise as stated in the thread is a bit hasty and insincere to be honest.

As a consequence of this, I don't think that the bug is resolved. Arch Linux, by distributing binary packages for GPL'd software, is required to host copies of the software so that they can ensure that regardless of what happens with the project's web site (I would not be at all surprised to find that Source links in packages found in AUR will go bad after time as project names and URLs change), the source package will always be available, without telling the user "go find it yourself". Alternately, the project could keep its copies of all of the sources private and provide them on request as phrakware said they would do, but I really don't think they have the resources to do that.

Consider that I could write a script that would dig through AUR, pick packages randomly (including picking random versions of packages), and then mail a request off to the Arch team asking for those on CD. Now the Arch team is responsible for downloading all those sources (and some of them will probably require some work as the URL may have changed since it was put into AUR, so the original sources will have to be hunted down) and putting them on CD for me. That just seems needlessly onorous for the Arch team. Why not just replace all of the source links in the AUR package pages with links to the file cached on Arch's servers, and be done with the problem once and for all? I don't know how much additional disk space this would require of Arch, and I know it would not be insignificant, but that's the cost of the GPL. You get good quality free software, and you get to redistribute it for free, but if you do so, you have to pay the cost of making the sources available. I've got my $20 donation to the Arch team ready to send should they need it to cover the extra disk space costs.
Comment by Bryan Ischo (bji) - Tuesday, 20 May 2008, 21:52 GMT
Here is an example of Arch's mechanism for "making source available" is faulty.

From the current Packages pages of AUR:

http://aur.archlinux.org/packages.php

The "855resolution 0.4-5" package has its Source link as:

http://perso.wanadoo.fr/apoirier/855resolution-0.4.tgz

However, this is a bad URL and results in some French "file not found" portal page.

This particular package is not under the GPL, so it isn't a direct example of GPL violation by Arch. But it illustrates how the Arch mechanism for linking to source can become broken and if this ever happens for a GPL package, then this *will* be a violation of the GPL (even more so than not hosting source directly already is).
Comment by Allan McRae (Allan) - Tuesday, 20 May 2008, 22:21 GMT
bji - we do not distribute binaries for any packages in unsupported part of AUR. Therefore we do not have to supply the source. In fact, if you build the package from AUR then distribute it, then you have to provide the source... The example you chose is in the [community] though, but not under GPL so is moot.
Comment by Bryan Ischo (bji) - Tuesday, 20 May 2008, 22:32 GMT
Yes, you are correct, my specific example was not of a GPL package. I was just trying to illustrate, in case my previous verbose description was confusing, what I meant when I said that the Source links from Arch packages could easily become broken and thus no longer even provide a link to the source. Since this problem could also occur for GPL packages, this seems like a problem for Arch's GPL compliance.

I think the GPL's requirements can be summarized pretty simply as:

"If you host a GPL'd binary, you have to host the source as well."

Note the word HOST in this statement. If you HOST a binary, you have to HOST the source as well. You can't HOST the binary but provide a link to the source.

[Yes, there are alternatives, such as providing the source only upon request, but that a) is not a very friendly way to treat your users and b) requires that ability to produce such CD's upon request which as I wrote above, doesn't seem very likely to be possible if you don't have all of the source archived anyway]
Comment by Bryan Ischo (bji) - Tuesday, 20 May 2008, 22:39 GMT
For what it's worth, I just started using Arch within the last week, and I know that my understanding of how its packages are hosted is a bit lacking. I'm basing my statements on my understanding that Arch only provides links to the sources for the binary packages that it hosts, which like I said, doesn't meet the requirements of the GPL.

I have been really happy with Arch so far, and am fully prepared to convert to exclusively using Arch because my experience has been so great. However, this GPL issue gives me a little pause. Not only do I have a moral belief that the GPL should be adhered to, to the point of going above and beyond its most basic requirements to satisfy the spirit as well as the letter of the GPL (for example by providing convenient links for every binary package in Arch to Arch-hosted sources), but I also have a practical concern: I don't want to ever find myself in a position of needing to compile some package that I have on my Arch system but am unable to get to the source because Arch's links are bad or out of date, and have to go out searching for the original sources myself. That is not a risk I feel willing to take.

However, I am not pointing fingers here; I don't think that the Arch team is explicitly trying to violate the GPL or be 'bad sports', I just think this issue needs a bit more attention and to be solved using the same great KISS philosophy that Arch is based on. And I have no doubt that it can be solved in a way that satisfies the GPL, makes the users (and me) happy, and doesn't stress Arch's team very much. I look forward to that solution!
Comment by Aaron Griffin (phrakture) - Tuesday, 20 May 2008, 22:50 GMT
Holy shit that's a lot of text. I will summarize the entire solution:
We are working toward getting real sources on the mirrors in addition to the packages.

Rather than whining like a bunch of little children, why don't you help out instead of acting like some sort of GPL Police.

For fuck's sake. I care about the GPL too. We are not compliant right now, and are trying. Random rants on some random schmucks website are not enough - NOTIFY people. Tell someone. Don't rant to a wall and expect people to notice.
Comment by Bryan Ischo (bji) - Tuesday, 20 May 2008, 23:13 GMT
That's good news. Like I said, I've got money in hand ready to send in to help support the costs of bandwidth and disk space for the sources. I am not sure what kind of other concrete help I can offer at the moment.

If you're implying that I'm whining like a child, well not that it probably matters to you or anyone else, but I think that's ridiculous. I was just trying to educate you on what you need to do, to get you to take this problem seriously (and it didn't SEEM like you took the problem seriously, given your obviously disingenous offer to provide sources on CD upon request, and the fact that it's been two years since you made that offer and still never actually took any steps to really come into compliance with the GPL).

I'm super glad to learn that you 'are trying', and I have no doubt that a great solution is forthcoming. But don't tell me or anyone else that we're 'whining like a bunch of little children'. That's extremely disrespectful and unnecessary.
Comment by Aaron Griffin (phrakture) - Tuesday, 20 May 2008, 23:19 GMT
Bryan, the "whining" rant was targeted at the Beranger guy. Sorry if that was misunderstood.
Comment by Aaron Griffin (phrakture) - Tuesday, 20 May 2008, 23:22 GMT
However, for the record, you a calling me "disingenous[sic]" without ever asking for a CD. You read some text I wrote and it is 100% impossible to know my motives. Please don't make statements like this, it is flat-out silly.

I actually was ready to provide CDs when I made that offer, and have not once gotten a request for a CD. Would you like one?
Comment by Bryan Ischo (bji) - Tuesday, 20 May 2008, 23:35 GMT
You are right, I should not have written that, I am sorry.

I guess it just seemed impossible for me to believe that you could really think that you'd be able to supply a source CD cobbled together from a bunch of sources which you may or may not have direct or easy access to. So I assumed that you didn't really think very hard about whether or not you could actually *do* it, and that it was just easier to *say* that you would be willing to do it, than to actually solve the problem of making sources available via the Arch package repository.

However, that's quite a few assumptions for me to make about your intentions and your motives and really there's no excuse for me doing that. Sometimes I find myself 'thinking for the other guy' rather than just asking him what he meant. I think it comes from having a very impatient personality.

I really like Arch and I just want more reasons to like it, GPL compliance would be huge for me. Once again, thanks for taking it seriously and I look forward to seeing the solution.
Comment by Aaron Griffin (phrakture) - Wednesday, 21 May 2008, 20:39 GMT
To everyone involved, please see the "RE-UPDATE" section at the bottom here.
http://beranger.org/index.php?page=diary&2008/05/20/07/13/57

We talked it over via email, and things were handled nicely.

That said, I am testing a script to generate source tarballs here:
http://dev.archlinux.org/~aaron/sources/ (note: Access Forbidden right now, check back soon)
Comment by Xavier (shining) - Thursday, 22 May 2008, 06:40 GMT
One point that I don't find clear at all :
"No. The GPL requires (section 3) you to be able to: (i) either provide the FULL, COMPLETE sources; (ii) or come with a written offer (valid for at least 3 years! can you guarantee that?) that you can provide the FULL, COMPLETE sources. You can charge for the cost of physically making and sending CDs/DVDs, if this is the case."

So in the first case, how long do you have to provide the full, complete sources? 3 years too?
Suppose you release the arch package foo 1.5-1 . Then two days later, you realized there was a new 1.6 version, and you release the new package.
Now you have to keep both 1.5 and 1.6 sources for 3 years?
Comment by Bryan Ischo (bji) - Thursday, 22 May 2008, 06:49 GMT
With regards to Xavier's question:

First off, you are only required to provide sources to anyone who you distributed the binaries to. So if no one downloaded the foo 1.5-1 binary package, then you technically would never be responsible for providing the source to anyone. However, this would be pretty difficult to track so for all practical purposes, once you've put a binary package up you have to assume that someone has downloaded it, and that you will have to honor any request for sources for that package for 3 years.

And yes, you'd have to keep sources for both the foo 1.5-1 package and 1.6 package, for 3 years from the date that you last distributed those packages.

Each time you supply a binary package to someone, the GPL requires you to make available the sources for that exact package for three years to *that person*. You technically don't have to make the sources available to anyone else. And you technically can have a time limit that is unique to each person and package download. However, this is so far beyond the bounds of practicality that it is simply easier to make the sources for all versions of all packages available to everyone for at least three years beyond the date that you last made binaries available.

Comment by Andreas Wagner (awagner) - Thursday, 22 May 2008, 07:07 GMT
hold on, I thought the 3 years were not relevant in first case, i.e. if you offered a source download along with the binary dl, then you'd have to prvide that only for so long as there was the binary download. So, /if/ you host foo-1.5-1.src.tgz along with foo-1.5-1.pkg.tar.gz then you can remove and forget about that as soon as you put foo-1.6-1.pkg.tar.gz and foo-1.6-1.src.tgz online. That's what all the fuzz is about, it's not really feasible to retain all source code for three years and thus the written promise (ii) would be hard to keep, which is why it's better to go with (i).
Comment by Bryan Ischo (bji) - Thursday, 22 May 2008, 07:21 GMT
It's an interesting question and I guess there are different interpretations of the GPL on this.

I form my opinion based on this section from the GPLv2:

"3. You may copy and distribute the Program (or a work based on it,
under Section 2) in object code or executable form under the terms of
Sections 1 and 2 above provided that you also do one of the following:

a) Accompany it with the complete corresponding machine-readable
source code, which must be distributed under the terms of Sections
1 and 2 above on a medium customarily used for software interchange; or,

b) Accompany it with a written offer, valid for at least three
years, to give any third party, for a charge no more than your
cost of physically performing source distribution, a complete
machine-readable copy of the corresponding source code, to be
distributed under the terms of Sections 1 and 2 above on a medium
customarily used for software interchange; or,
"

(I omitted (c) because I don't think it's relevent)

I guess it all depends on what 'Accompany' is interpreted to mean. For binaries distributed on physical media such as CD-ROM or tape, it's quite clear: if you put the sources on the same CD/tape that the binaries are on, you have satisfied (a).

But what about an FTP or HTTP server? For a file like an ISO image file, it is also possible to include the source with the binaries, since when the user downloads the image, they get the whole image, and if you've included the source in it, you've satisfied (a).

But what about for links to individual binary package files with links to source files on the same page? The user can download just the binary package file and not choose to download the source at the same time. It "feels" to me like this is not satisfying (a) because after the file transfer operation resulting from clicking on the binary package link, they have just the binaries - the file that they downloaded (the "media") has no source in it. So the link to the source "feels" to me like it is an "offer to provide the source upon request" - i.e., the source is there should the user choose later to download it. This sounds to me like it satisfies (b), which I interpret as meaning that the source link has to be kept around for 3 years after the binary link goes away.

However, this section from the GNU GPL FAQ seems to take a different view:

"Can I put the binaries on my Internet server and put the source on a different Internet site?

The GPL says you must offer access to copy the source code “from the same place”; that is, next to the binaries. However, if you make arrangements with another site to keep the necessary source code available, and put a link or cross-reference to the source code next to the binaries, we think that qualifies as “from the same place”.

Note, however, that it is not enough to find some site that happens to have the appropriate source code today, and tell people to look there. Tomorrow that site may have deleted that source code, or simply replaced it with a newer version of the same program. Then you would no longer be complying with the GPL requirements. To make a reasonable effort to comply, you need to make a positive arrangement with the other site, and thus ensure that the source will be available there for as long as you keep the binaries available. "

The very last part - "ensure that the source will be available there for as long as you keep the binaries available" seems to say that you only have to keep the sources available as long as the binaries are available. I don't know how this jives with the 3 year requirement of (b), but there you have it.

Also, here is another interesting thing from the GPL FAQ:

"Can I make binaries available on a network server, but send sources only to people who order them?

If you make object code available on a network server, you have to provide the Corresponding Source on a network server as well. The easiest way to do this would be to publish them on the same server, but if you'd like, you can alternatively provide instructions for getting the source from another server, or even a version control system. No matter what you do, the source should be just as easy to access as the object code, though. This is all specified in section 6(d) of GPLv3.

The sources you provide must correspond exactly to the binaries. In particular, you must make sure they are for the same version of the program—not an older version and not a newer version."

Note that the above specifically prohibits what phrackture had offered to do, which is to send a CD upon request to provide sources for binary package files which were originally made available via a computer network.

However, this FAQ seems to be a bit GPLv3-oriented so it's possible that the FAQ doesn't apply so well to GPLv2 software, which is what the majority if packages in Arch use (I think). I just looked at the GPLv3 and it's significantly different when it comes to requirements for binary packages hosted on a computer network, which only adds to the confusion.
Comment by Aaron Griffin (phrakture) - Thursday, 22 May 2008, 16:25 GMT
The 3 year case is only relevant with the written offer. If we are providing source on an ftp site or something of the sort, we only need to provide the source as long as we provide the binary.

Additionally, Bryan, can we stop taking jabs at me regarding this issue? I am solving it, there is no reason to call me out like this.
Comment by Bryan Ischo (bji) - Thursday, 22 May 2008, 17:33 GMT
Sorry - no jabs intended, honestly. I know it keeps coming out that way but I really am only interested in figuring out what is required by the GPL and what isn't. I thought that the text that I quoted brought some new insight into the discussion (at least for me) and I wanted to tie it back to your offer only because I felt like it was definitive whereas everything I had said previously was more based on my opinion. Anyway, no offense intended and I promise that I'm done even mentioning your offer. Sorry again.
Comment by Branko Vukelic (foxbunny) - Saturday, 24 May 2008, 22:45 GMT
Who has the power to enforce the GPL?

Since the GPL is a copyright license, the copyright holders of the software are the ones who have the power to enforce the GPL. If you see a violation of the GPL, you should inform the developers of the GPL-covered software involved. They either are the copyright holders, or are connected with the copyright holders. Learn more about reporting GPL violations.
Comment by Branko Vukelic (foxbunny) - Saturday, 24 May 2008, 22:57 GMT
Sorry for the last one without explanation. All I want to say is, you can all relax, and do this nice and slow.
Comment by meandean (meandean) - Monday, 30 June 2008, 15:52 GMT
Just some general replies to various statements. None of this is personal or directed at anyone. It is simple discussion of the issue.

>the GPL requires you to make available the sources for that exact package for three years to *that person*.
>You technically don't have to make the sources available to anyone else
Not true. If you choose to satisfy section 3 with the written offer (offer assumed if the source does not accompany the binary) then that written offer is good for ALL third parties. That is where 3c comes into play.

>Information will be added to the downloads page
Has this been done? I could not find it and had to ask on the forum.

>We are working toward getting real sources on the mirrors in addition to the packages.
any update on this?

>http://dev.archlinux.org/~aaron/sources/
Is this the source for the latest ISO or the current binaries that are available or what?

>I guess it all depends on what 'Accompany' is interpreted to mean
The word 'accompany' is a well defined word. The source has to be with the binary and the best way to ensure this is to host the source yourself. That doesn't mean the source and the binary has to be on the exact same server in the exact same location, but it is important to have a link to the source in the same location you have the link to the binary. If you can guarantee that a separate site will have the source available then that is acceptable but the question is, can you guarantee that.

>The 3 year case is only relevant with the written offer. If we are providing source on an ftp site or something of the sort,
>we only need to provide the source as long as we provide the binary.
True. But the source must be as readily available and accessible as the binary. Having links to a ISO full of binaries on the download page without having a link to the source code on the downloads page is a violation. If you have the source, which you must if you used it to build the binaries for the ISO then simply make those sources available either in a directory or rolled into a source ISO.

Along the same lines....if you provide a binary disc through osdisc then it would be a good to provide a source disc as well.

In my opinion, being GPL compliant is as important as any other aspect of a distro because it is required. It is not optional or a less-important task. Pretty artwork, improved installer, bug fixes are all important but they are optional or 'at your leisure' - license requirements are not. I hope this issue gets resolved. I will not consider arch further until more is done to resolve this issue.
Comment by Glenn Matthys (RedShift) - Friday, 05 December 2008, 11:40 GMT
What's the status of this issue?
Comment by Bryan Ischo (bji) - Thursday, 25 December 2008, 12:31 GMT
I've just returned to Arch after 6 months. I'm very disappointed to see that there has been no apparent progress on this issue.

Let me take at a stab at it.

- Use Amazon S3 for storage of the sources for all Arch-distributed GPL'd/LGPL'd binaries
- Have developers who maintain core/extra/community binary packages be responsible for uploading the sources to S3, for each source that is referenced by the package
- Modify the makepkg tool so that it first downloads from the source location listed in the PKGBUILD file, but if that source cannot be downloaded for some reason (such as the location of the source no longer being valid), it falls back to fetching the source from S3; in this way, sources would continued to be loaded from their original hosted locations, saving S3 bandwidth charges to Arch, until the original source is gone. In most cases, sources would never be downloaded from Arch's S3 bucket, so the bandwidth costs would be low.

I suggest using S3 because it takes the problem of managing the serving up of these source files completely away from the Arch team. All the Arch team would have to do would be to pay the monthly S3 fees; no worries about server hardware or backups or anything.

I've done some quick estimates of the cost of hosting the sources in this way. I count 4125 packages currently in core, extra, and community. Assuming that ALL of them are GPL'd/LGPL'd and require the source to be hosted, and assuming that the average size of the source tarballs is 5 MB, the total disk space needed would be about 20 GB. On S3, 20 GB storage costs about $3 per month. There is also a one-time cost for uploading all of that data, of about $2. Assuming 100 GB of file downloads from this bucket per month, the monthly data transfer fees would be $17.

So all told, with some very conservative estimates, the total cost per month to bring Arch in compliance with the GPL would be about $20. And likely the cost would be considerably less because a) not all packages would need to have source hosted, b) the average size of source files for packages is probably less than 5 MB, and c) assuming that the sources are almost always fetched from the original host by makepkg instead of from Arch's S3 bucket, monthly download would likely be alot less than 100 GB.

I think $20 per month is not a great barrier to bringing Arch into GPL compliance. In fact, I volunteer to pay for ALL of the Amazon S3 fees *personally* if we can just get the ball rolling on this.
Comment by Bryan Ischo (bji) - Thursday, 25 December 2008, 12:36 GMT
How about this, I'll make it even a little easier. Rather than having the Arch devs responsible for uploading sources to S3, I'll run a cron job on my server that periodically downloads the sources for any new packages that it finds in the Arch repositories, and uploads them to the Arch Sources S3 bucket. The Arch team won't have to do ANYTHING, except for modify makepkg so that it falls back to using this S3 bucket if it can't get the original source (I can even hack this functionality into makepkg so the dev team doesn't have to do anything at all). AND I'll pay for all of it. Merry Christmas!
Comment by Glenn Matthys (RedShift) - Thursday, 25 December 2008, 12:37 GMT
Maybe we need more legal advise on this, maybe there's a way to "circumvent" this clause of the GPL?
Comment by Bryan Ischo (bji) - Thursday, 25 December 2008, 12:40 GMT
Are you KIDDING ME? Seriously, are you suggesting trying to find a legal loophole in the GPL rather than just owning up to the responsibility of being a distributor of GPL'd software?

Can you explain how it is better to try to get legal advice to see if Arch can somehow screw over the authors of GPL software, than to just bring Arch Linux into GPL compliance by cheaply hosting the sources on S3?
Comment by Glenn Matthys (RedShift) - Thursday, 25 December 2008, 12:53 GMT
Yeah that's exactly what I'm suggesting. I think this requirement of the GPL is ridiculous. Note that I am NOT speaking for the Arch team, I am speaking for myself.
Comment by Allan McRae (Allan) - Thursday, 25 December 2008, 13:38 GMT
There has been some progress made on this. The server upgrade provides up with enough room to do this now and scripts to do this automatically are under way/finished(?).

Also note, that (IIRC) linking upstream is technically enough under GPL3, so we only really need to host packages that are licensed as GPL2 only.

I don't see why we would need to modify makepkg to fallback to getting the source from wherever we posted them (although that may be a nice feature).
Comment by Bryan Ischo (bji) - Thursday, 25 December 2008, 19:23 GMT
Glenn: that you you think source requirements of the GPL is ridiculous is not really relevent. Simply put, it's a requirement of the GPL and there is no real point in debating it, just accept it. There are good reasons that the GPL has this requirement - it's to ensure that source is available to anyone who downloads a binary, and it requires those who distributes binaries to take the responsibility of making the source available also, rather than depend on someone else who may or may not provide source in the future. If you're going to take all the benefits of the hard work of people who wrote GPL software, the least you can do is abide by the terms of the GPL, especially when they're so easy to abide by (seriously, if you can host the binaries, you can host the sources, case closed).

Allan: thank you for the update, much appreciated. Because it's been six months since this bug was filed, and since I think that many people who care about the integrity of the Linux distribution they install on their computer take this issue seriously, I think that it would be a good thing to provide an actual ETA for this feature. If you have the disk space, and the scripts, then what's holding the process up for completing this feature and closing out this bug?

Also - I've read about the GPLv3 on the GNU site. I don't see any sigificant changes in its requirement for hosting source over GPLv2. It seems to make explicit the fact that you can host the sources on a different server than the binaries, but still requires YOU (the hoster of the binaries) to host the sources, or to provide guarantees from the third party to whom you provide links to the source in lieu of providing the source yourself, for three years after you stop providing binaries. There is no way that Arch is doing this, it's much more onerous than just hosting the source. So please, just host all GPL source, v2 or v3, and put this issue to rest. And please note, that the GPL requires that Arch patches to the source that were used to create a binary be hosted along with the original source as well, so don't forget to do that.

Comment by Aaron Griffin (phrakture) - Friday, 26 December 2008, 00:14 GMT
Anything that incurs a monthly fee is going to be out the window because we do not have a steady revenue stream.

There *has* been progress on this. We had to buy a new server to support the sheer size of these sources, and cron jobs and scripts have been comitted to the dbscripts repo.

They just need to be turned on.... but it's Christmas guys... go have fun with your families, we don't need to snipe at each other about "integrity" and things of this nature.
Comment by Bryan Ischo (bji) - Friday, 26 December 2008, 00:46 GMT
Progress is good. I'm just asking for an ETA on this, because there hasn't been any user-visible plan of action or expected date of completion of this task.

For what it's worth, it's not Christmas where I am, it's the day after. Either way, nobody's asking anyone to post to this bug instead of enjoying their holiday. Come back to it when your holiday is over. When that happens, could you just give me, for my own curiousity, the estimates that Arch developers have used for the amount of data that will be stored in the source repository, and the expected bandwidth used per month? It would be interesting to see if S3 would be cheaper or more expensive according to your estimates (the cost of a server would buy years of S3 service at the usage level I'd expect for Arch sources). It's not that I'm proposing switch to S3 if the Arch team already has bought a dedicated server for this purpose, I'd just like to understand the economics involved because it might be a useful comparison to refer to in the future for such decisions.

Finally, it's not 'sniping' to point out that there are people who care about the GPL compliance of Arch. I'm talking about the integrity of the distribution as being GPL-compliant, and in this way adhering to the philosophies that the GPL sets out both in spirit and in letter. I'm not talking about the personal integrity of anyone posting to this bug or using Arch, sorry if that was not clear.
Comment by Bryan Ischo (bji) - Friday, 26 December 2008, 01:04 GMT
Also, I see you've taken this bug, Aaron. Thank you for taking over this important task. Would you please give some indication of what the severity of this issue is from your perspective? The bug is marked 'Low Severity, Normal Priority' - does that match your priority level for completing this bug? Thank you.
Comment by Glenn Matthys (RedShift) - Sunday, 28 December 2008, 07:11 GMT
Bryan: this bug is marked low severity because it is regarded by that by the Arch community. Unless it's destroying your data and eating your dog it's definately not high/critical severity stuff. Somehow I find you asking Aaron personally how he regards this bug a bit intrusive. When you're talking here, you're talking to the Arch developper community, you cannot hold anyone personally responsible. Just saying this before it would get out of hand.
Comment by Bryan Ischo (bji) - Sunday, 28 December 2008, 08:55 GMT
Hm, I seem to be continually misunderstood. I am not sure why this keeps happening.

I merely asked what the severity of the bug was from his perspective. Since he just took ownership of the bug, presumably because he would like to personally address it, it seems like a perfectly reasonable question to me, especially considering how long this bug has been open. As a software developer myself I know that it's easy to have inaccurate priority fields in a bug entry since priorities change and keeping these sorts of fields up to date often slips through the cracks.

I find it a little worrying that complying with the license requirements of software that the Arch project is shipping is deemed low priority; I would expect that it would be one of the most important kinds of bugs to fix. I'm part of "the Arch community" and I personally find this issue problematic enough that it alone will induce me to give up on Arch completely, should it not be addressed soon. This is not a threat, it's just a fact, and I say it only because I want you to understand that this is a very important issue to some people. I wish someone else who feels strongly about the GPL would chime in here so I wouldn't feel like the only one :)

But moreover, you should realize that you are legally obligated to fix this bug, by the terms of the GPL, regardless of how low you prioritize this issue.

Comment by Aaron Griffin (phrakture) - Monday, 29 December 2008, 19:39 GMT
Bryan, you tend to be misunderstood due to the demanding nature of your comments. This is being worked on, we're getting there. But quite frankly - having Arch actually in working order is far more important to me.

I have a hundred different things to work on at any given time and this is simply low on my list of things because it's going to give us little direct benefit (besides getting rid of these "OMG MUST FIX NOW" bug comments).

The fact that work has been done on this (code has been written, servers have been upgraded, etc) should prove to everyone except the most zealous that we are working on this in good faith.
Comment by Bryan Ischo (bji) - Monday, 29 December 2008, 20:00 GMT
Thanks for your response. The ONLY thing I am asking for at this point is some sort of estimation of when this will be completed. Can you please post some kind of schedule for this or estimation of its date of completion? Thank you.

I'll apologize again if I seem demanding. It's hard for me to explain why I feel the way I do or why I am trying so hard to get accelerated action on this bug without sounding like I am demanding something, but please understand that it comes from caring about this issue alot. I love using Arch but it pains me to be participating in copyright violations against thousands of software developers who have given their works away for free.

Comment by Aaron Griffin (phrakture) - Monday, 29 December 2008, 20:58 GMT
I can't post a schedule exactly... how about "sometime in January" ?
Comment by Bryan Ischo (bji) - Monday, 29 December 2008, 21:00 GMT
That is awesome, and much appreciated. "sometime in January" is a great answer and I'll shut up now, thanks!
Comment by Aaron Griffin (phrakture) - Monday, 29 December 2008, 21:01 GMT
Also, clarification: We are NOT violating copyright. The violation is in the terms of the GPL *license* not the copyright held by the author. People still get their credit and the copyright is theirs... we're just half-way violating a small portion of the license for a segment of our packages (BSD, GPL3, and similar licensed packages do not have the source stipulation)
Comment by Bryan Ischo (bji) - Monday, 29 December 2008, 21:07 GMT
OK, I promise I'll shut up after this issue is properly clarified.

The GPL packages that Arch Linux is distributed are under copyright by their authors. As such, Arch Linux (or anyone else) has no legal right to distribute copies of this software without explicit permission from the copyright holder. The copyright holder gives explicit permission to anyone on condition that they abide by the terms of the GPL license. If you don't abide by the license, then you don't have permission to distribute the software. If you don't have permission to distribute the software and are doing so anyway, then you are violating the copyright.

To put it another way, you said: "The violation is in the terms of the GPL *license* not the copyright", but the license is the only thing that gives you permission to copy the work, and since Arch Linux is not complying with the terms of the license, Arch Linux has no permission to copy the work, and is thus violating the copyright.
Comment by Aaron Griffin (phrakture) - Friday, 09 January 2009, 23:23 GMT
I am currently generating a handful of sources here: ftp://ftp.archlinux.org/sources/
I will stop it at some point, so we have a point of reference to look at.

There are a few pending questions I'm waiting to get answered by the devs.

When those are answered, I will set this up to run as a cron job and keep our sources updated
Comment by Bryan Ischo (bji) - Saturday, 10 January 2009, 03:14 GMT
Awesome - great progress!

However, I think you might need to change the directory structure you are using for hosting the sources. The reason I say this is that by the terms of the GPL, you are required to host not only the original sources, but your patches to those sources, for any binary you distribute based on the sources+patches. So basically, the contents of the /var/abs directory for any package built and distributed needs to be hosted. This includes the patches, associated scripts, etc.

And this has to be done for every version of the software that is distributed, so I think that the most logical way to do this would be to host a directory for each package, of the name:

/sources/${PACKAGENAME}-${VERSION}

Whose contents would be basically exactly the same as the /var/abs/.../${PACKAGENAME} directory for that particular package and version.

Example:

In my /var/abs/core/grub directory I currently have:

[code]
bji$ ls /var/abs/core/grub
040_all_grub-0.96-nxstack.patch grub-inode-size.patch intelmac.patch
05-grub-0.97-initrdaddr.diff grub.install menu.lst
PKGBUILD i2o.patch more-raid.patch
grub-0.97-gpt.patch install-grub special-devices.patch
[/code]

The version of the package is 0.97-14. So I think that the corresponding directory on ftp.archlinux.org would be /sources/grub-0.97-14 (or /sources/core/grub-0.97-14 if you prefer, for extra consistency with the /var/abs directory tree structure), with all of the above files in it.

This could be really easily scripted on the ftp server, since all that needs to be accomplished is scripting a means for detecting when a new package source is available (via pacman -Sy), then checking the resulting new PKGBUILD files and copying the /var/abs/.../${PACKAGE} directory over to /ftp-root-whatever/sources/${PACKAGE}-${VERSION} for all new versions (this might be most easily accomplished, although not most efficiently, by simply checking every PKGBUILD file in /var/abs and seeing if a corresponding /ftp-root-whatever/sources/${PACKAGE}-${VERSION} directory exists, and if not, copying it over, without having to otherwise keep track of which packages are "new" since the last pacman -Sy). Please note that this requires that the process that checks for new packages and copies files to the ftp directory run frequently enough that it never "miss" an updated package.

Also, this doesn't need to be done for all packages - the script could check the license line of the PKGBUILD files and only copy those which include GPL, LGPL, or any other terms which indicate a GPL-style license with GPL source hosting requirements.

Make sense?



Comment by Allan McRae (Allan) - Saturday, 10 January 2009, 03:16 GMT
By the look of the filenames, there are made using "makepkg --allsource" which makes a tarball of the original source, PKGBUILD and any other needed files.
Comment by Bryan Ischo (bji) - Saturday, 10 January 2009, 03:21 GMT
Oh one other issue, is how you're going to retroactively acquire the sources and patches for binaries that you've already distributed. I know that this issue has been going on for years and technically, anyone could come back to you with an old installation CD from a couple of years ago and harass you for not having the source code to whatever binary packages were on that CD available for them.

However, this is a) highly unlikely to ever occur, b) could be dealt with on a case-by-case basis instead of having to go through the effort of becoming retroactively GPL-compliant for all old Arch Linux packages, and c) certainly ought to be deferred (probably indefinitely) as a low-priority secondary task related to fixing GPL compliance for Arch Linux. In 3 years this aspect of the GPL compliance issue will be irrelevent anyway due to the terms of the GPL only requiring source hosting for 3 years.

Getting the existing current packages and all future versions up on the source server, is by far the most important thing. If you never find a way to get old package sources up, I for one won't care, and I'm one of the most vocal GPL proponents you're likely to find (obviously :) ...
Comment by Bryan Ischo (bji) - Saturday, 10 January 2009, 03:22 GMT
Allan: thanks for the info, very much appreciated. Obviously I didn't look closely enough at what Aaron had put up there. That alleviates all of my concerns, thanks!
Comment by Bryan Ischo (bji) - Saturday, 10 January 2009, 03:40 GMT
One more practical concern: putting all the sources in one directory is eventually going to be a problem, both for FTP listing, and because, well, having too many files in a directory is just generally a bad thing. If there are 5000 packages to host, and each one has 4 versions, that's 20,000 files. I can just about guarantee that this is going to cause all kinds of problems on the FTP server.

You might consider a directory structure like this:

/sources/core/grub/grub-0.97-14.src.tar.gz
/sources/core/grub/grub-0.97-15.src.tar.gz
/sources/core/grub/grub-0.98-1.src.tar.gz
...

Since the directory structure of /var/abs is already being managed by somebody somewhere to ensure that the number of packages at any given level of the tree never gets too big, then re-using the same directory structure on the FTP server will gain the same benefit.

Additionally, if you're worried about the amount of disk space that you'll have to use, you might re-consider hosting the sources in directories (e.g. /sources/core/grub/{0.97-14,0.97-15,0.98-1}), with hard links for files which don't change between versions to save disk space (i.e. instead of a copy of the same grub source in the first two directories in my example, there would be just one copy, with the second directory having a hard link to it). This could be a big savings for packages which have lots of revisions in which the upstream source doesn't change, but patches and build scripts and such do. And finally, I believe that some FTP servers have features for automatically tarring up directories for download, which would let users get the same tarball that they'd get from "makepkg --allsource" instead of having to get the files individually.

Comment by Allan McRae (Allan) - Saturday, 10 January 2009, 03:49 GMT
I believe we will only be hosting the source for the packages we are currently distributing. So there will be at most two copies (one in [testing] and one for the main repos) and the vast majority of packages will only have one copy. The tarballs for packages on the installer can easily be dumped in a separate directory at release time.
Comment by Bryan Ischo (bji) - Saturday, 10 January 2009, 03:51 GMT
You do realize that you are required by law (i.e. the GPL and its interactions with copyright law) to host all sources for all binaries you distribute for 3 years after you stop distributing the binaries right? So you're really going to be hosting sources for the packages that you have distributed in the last three years (all versions thereof), not just the packages (and versions) you are currently distributing.
Comment by Allan McRae (Allan) - Saturday, 10 January 2009, 03:55 GMT
AFAIK, it is three years if we provide a written offer of sources. No such stipulation applies if sources are hosted along with distributed binaries.
Comment by Bryan Ischo (bji) - Saturday, 10 January 2009, 05:08 GMT
Yes, I think you may be right about that. However, if disk space permits, it would be a really nice thing to host the sources for as long as is practical (say six months or a year).

It's a very nice thing to be able to leave legacy software running on a system and know that you can always get at the source should you run into a problem with that particular software. As a software developer sometimes I want to look at the source code for libraries I am using to help with debugging, and I don't want to always have to stay completely up-to-date with Arch Linux packages to have easy access to source.

But, this is just a nice-to-have feature and not something that is required to comply with the GPL. So do as you think is best.
Comment by Aaron Griffin (phrakture) - Monday, 12 January 2009, 16:50 GMT
We will only be hosting sources for packages we distribute. When we no longer distribute these packages, we no longer need to distribute the sources.

And yes, they do include all patches and necessary files to build the package exactly as the official package has been built.

If *you* would like old sources, why not mirror them for a given period of time on your own web hosting service? This seems like a great community project, if you ask me.
Comment by Bryan Ischo (bji) - Monday, 12 January 2009, 19:04 GMT
Yes, I agree with you on that. I've been mulling over the idea of making sources for Arch available on my own server, as you have suggested. Do you have any estimates on how much disk space is needed to host the current sources? I'd have to extrapolate from that to try to figure out how much disk space would be needed to keep all versions instead of the most recent version. At $0.17/GB/month, I personally would be comfortable keeping maybe 50 GB of data up on S3 ...
Comment by Aaron Griffin (phrakture) - Monday, 12 January 2009, 19:22 GMT
I think, when I had all the sources generated and sitting in my home dir, it was around 10-12 gigs.
Comment by Greg (dolby) - Monday, 02 February 2009, 03:26 GMT
"Some time in January" wasnt accomplished. I would propose waiting for community to be included first, and then take a look at this again.
Comment by Aaron Griffin (phrakture) - Monday, 02 February 2009, 21:42 GMT
Says who?
ftp://ftp.archlinux.org/sources/

It's not 100% complete, but it's all automated and mostly working fine.

Just started another run, I don't think I got extra in there last time I did this - trying to do it in spurts so that we don't tax the server too much.
Comment by Greg (dolby) - Tuesday, 03 February 2009, 01:29 GMT
When i made the comment, at least some part of extra was there. I remember abiword for example.
Comment by Aaron Griffin (phrakture) - Tuesday, 03 February 2009, 18:14 GMT
Ok, it should all be there. Sources for all packages that use (L)GPL/(L)GPL2 licenses. I will set this up to run nightly to update things.
Comment by Greg (dolby) - Tuesday, 03 February 2009, 18:25 GMT
lftp ftp.archlinux.org:/sources> ls -l |wc -l
1347

As far as ive seen though, this dir doesnt get picked up by the mirrors. Ive tried some of them and only the Indonesian had it incl. sources.Another one had it empty, most not at all.
Is that up to the mirrors?
Comment by Aaron Griffin (phrakture) - Tuesday, 03 February 2009, 18:46 GMT
That's up to the mirrors (for now), as we weren't sure how big the sources would be. With only GPL-ish packages, we're at 3.8GB
Comment by Greg (dolby) - Friday, 27 February 2009, 17:52 GMT
A question regarding hosting the sources.
While browsing [community] to find out issues with unreachable source links, it was quite usual to find out some packages source links were snapshots etc not being hosted anymore in the application site.
See for example http://aur.archlinux.org/packages.php?ID=4440 .
As makepkg tries to download the sources from a remote source if it doesnt find them in the same direcotory its being invoked, ABS will fail every now and then for lazarus. This doesnt apply only to snapshots, for example attr does it too, see  FS#13134 .
What im asking is quite complex, and i have no idea what the answer is, thats why im asking.
Is there any gain in just hosting the sources like that in a dir in the ftp for compliancies sake if the user cant somehow easily access them with ABS? Can that be done already? If i point pacman to the ftp dir to get the source it will not pick it up because -src is appended to all of them. I hope that makes sense.
Comment by Aaron Griffin (phrakture) - Friday, 27 February 2009, 18:12 GMT
This will probably happen once community is moved to the official tools - this is on the docket
Comment by Greg (dolby) - Friday, 27 February 2009, 18:18 GMT
What i mean is, is there any way for makepkg to pick up the sources from ftp://ftp.archlinux.org/sources/attr_2.4.41-1-src.tar.gz instead of ftp://oss.sgi.com/projects/xfs/cmd_tars/attr_2.4.41-1.tar.gz which is present in the PKGBUILD thus ABS to work even if the package is outdated and its source isnt hosted on the project ftp?
Comment by Aaron Griffin (phrakture) - Friday, 27 February 2009, 18:20 GMT
That's a pacman issue/question and not in the scope here.

With official packages, we place sources in other/$pkgname on the ftp if they are snapshots or outdated often, and change the PKGBUILD to download from there. That is how this would be solved
Comment by Xavier (shining) - Friday, 27 February 2009, 18:59 GMT
I had exactly the same question when I first saw these source tarballs.
I don't remember exactly how I answered it.
Maybe I just figured that the purpose of these sources should not be a common usage, otherwise there could be a risk to overload ftp.archlinux.org by retrieving too many sources from there.
Comment by Aaron Griffin (phrakture) - Friday, 27 February 2009, 19:04 GMT
It'd probably fit in ABS better than makepkg
Comment by Ray Rashif (schivmeister) - Sunday, 07 June 2009, 13:04 GMT
I have a query about this at http://bbs.archlinux.org/viewtopic.php?pid=565117

Which basically is:

1) Is it just a matter of doing an --allsource and hosting it somewhere accessible?
2) Does it concern all versions of the (L)GPL (3 in particular)?
3) Any other licenses that need such compliance?
Comment by Aaron Griffin (phrakture) - Friday, 12 June 2009, 18:27 GMT
I think this is closeable now, right?
Comment by Eric Belanger (Snowman) - Friday, 12 June 2009, 18:44 GMT
core and extra are done. Only community remains to be done but it needs first to be switched to svn as the sourceball script assume the repo is in svn. There is a bug report about missing license in community. Whenever the svn switch is done, all we'll need to do is to enable the community repo for the sourceball and fix the remaining missing-license packages in community. So I guess we can close this bug.
Comment by Andreas Wagner (awagner) - Monday, 15 June 2009, 07:16 GMT
Before this is closed, and although this is maybe not the right forum for it - many thanks and kudos for taking care of this. I am more than happy that Arch has developers (and community) who find such a good way between minimalism, usability and open-source support (and license requirements). And that applies to the solution as well as to the schedule and process of implementation. IMHO it was well worth the wait. :applause:
Comment by Gerardo Exequiel Pozzi (djgera) - Wednesday, 22 July 2009, 23:21 GMT
[community] now have an SVN repo :) But still need the FTP directory for sources, right?
Comment by Eric Belanger (Snowman) - Wednesday, 22 July 2009, 23:53 GMT
support for community was added to git: http://projects.archlinux.org/?p=dbscripts.git;a=commit;h=b354593994a96901f4b86a0fb734edb5cb1b2348

Once it's pushed live, it'll do sourceballs for the packages in community.

I'll close this bug.

Loading...