FS#43382 - [calibre] 2.15.0-1 changes to upstream in PKGBUILD

Attached to Project: Community Packages
Opened by Eli Schwartz (eschwartz) - Friday, 09 January 2015, 04:00 GMT
Last edited by Doug Newgard (Scimmia) - Sunday, 25 June 2017, 15:08 GMT
Task Type General Gripe
Category Packages
Status Closed
Assigned To Jelle van der Waa (jelly)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 5
Private No

Details

Description: The PKGBUILD for calibre (as of version 2.15.0-1, and going back some time) removes several python modules from the calibre source tree:

# Remove unneeded files and libs
rm -rf resources/${pkgname}-portable.* \
src/six.py \
src/cherrypy \
src/html5lib \
src/chardet

These libraries are patched in calibre, in order to fix upstream bugs and stuff, and removing them in order to use the system packages causes these bugs to reappear. Kovid Goyal (calibre creator) has personally griped to me about this[1], and I have recently seen two examples of people having issues with the ArchLinux package that were fixed by my advice to install the self-contained binaries at http://calibre-ebook.com/download_linux
This would also fix  FS#36159  - [calibre] web server broken with python2-cherrypy 3.2.3-1
which was closed as an upstream cherrypy problem.

For more information, see:
http://www.mobileread.com/forums/showthread.php?t=253691
http://www.mobileread.com/forums/showthread.php?t=252658 (this is  FS#36159  again, actually. But )

There are indeed
commit log for the calibre-bundled cherrypy: https://github.com/kovidgoyal/calibre/commits/master/src/cherrypy
commit log for the calibre-bundled html5lib: https://github.com/kovidgoyal/calibre/commits/master/src/html5lib
commit log for the calibre-bundled chardet: https://github.com/kovidgoyal/calibre/commits/b552d25fe081dcc0da1e25c6043963009b124cd4/src/calibre/ebooks/chardet

I have no reason to suspect six.py is causing troubles, as calibre seems to bundle an old version from here: https://bitbucket.org/gutworth/six/commits/42c74399ac7a66713be2bb25c86bb558a8aca08e
So I will not go so far as to say that cannot be stripped (but I have asked on the calibre bug tracker).

The wrapper scripts for calibre-portable obviously can be stripped...

I do not believe ArchLinux should be modifying source packages when the developer specifically includes private, modified copies of an external dependency. The end result is users being unable to use pieces of calibre functionality when the issue has already been fixed upstream.


[1] http://www.mobileread.com/forums/showpost.php?p=3000044&postcount=4
This task depends upon

Closed by  Doug Newgard (Scimmia)
Sunday, 25 June 2017, 15:08 GMT
Reason for closing:  Fixed
Additional comments about closing:  calibre 3.0.0
Comment by Doug Newgard (Scimmia) - Friday, 09 January 2015, 04:42 GMT
"dangerous"? lol.
Comment by Eli Schwartz (eschwartz) - Friday, 09 January 2015, 04:53 GMT
I meant in the sense of arbitrary changes to source packages breaking things that weren't broken, but I concede the point. :o

If I knew how to edit the summary (seems to be impossible? although I know some bugtrackers can do that) I would've done so within seconds, but as it is, it can remain as a monument to my need to wait a few seconds and scan check my words for unnecessary drama.
Comment by Eli Schwartz (eschwartz) - Friday, 09 January 2015, 04:55 GMT
calibre no longer bundles six.py -- the changes are in the modified html5lib.
Comment by Doug Newgard (Scimmia) - Friday, 09 January 2015, 05:20 GMT
In general, Arch does not use bundled libs when possible. That's simply in keeping with how Linux works. The author is very much a Windows user/developer, where this kind of thing is acceptable. If he has to patch bugs out of a library, why not submit those fixes upstream? Looking at cherrypy, for example, there has been no changes in his branch for nearly 3 years, so why would this bug still exist upstream?
Comment by Eli Schwartz (eschwartz) - Friday, 09 January 2015, 05:51 GMT
I have no idea, but I assure you he isn't a Windows developer, that much I know for certain. He develops on linux, and his main computer runs Gentoo Linux.

http://lwn.net/Articles/456076/
http://www.mobileread.com/forums/showthread.php?p=1538454#post1538454
Comment by Doug Newgard (Scimmia) - Friday, 09 January 2015, 05:59 GMT
That may be, but he develops for Windows with a Windows mentality.
Comment by Eli Schwartz (eschwartz) - Friday, 23 January 2015, 19:43 GMT
Well, perhaps you can find a downstream patch to fix that too... since obviously the default is to assume the developer is always wrong?
Oh, and while you are at it, I believe debian can provide some inspiration for a few bundled (and likewise modified) sources that you overlooked...

Some facts on the ground: https://github.com/html5lib/html5lib-python/issues/119 -- not exactly bugfixes, this one.
I cannot claim to know why any specific changes were or weren't submitted upstream -- either way, it isn't for us to worry about, which was the point I was trying to make.
If the code exists in the source, it is there for a premeditated reason.
You are welcome to continue musing upon the appropriateness of upstreaming vs. maintaining modified libs, but what does that have to do with the practical effects?
If a software developer is allegedly doing the wrong thing, should we *break the software* in retaliation?

At the moment, calibre uses modifications both minor and major, and saying it shouldn't be so won't make it not so. I respectfully submit that the appropriate course of action is either to grin and bear it, or remove it from the repos, or fire off an angry note to the developer, or any combination thereof.

If the mentality is that offensive, perhaps calibre is unfit to be in the Arch repo? I don't know, what are the official guidelines on evil software that is developed with a Windows mentality?


I see this has been reclassified as a low-priority gripe. Is this because you do not think there is an actual bug? Then I refer you to the person whose books weren't converting... that is not exactly optional functionality, it is one of the main components.


Any update as to when this may be fixed? Or at the very least, more consistently apply downstream patches to ALL parts of the source that have been deemed unworthy?
If calibre is to be broken so it acts in an unpredictable manner, we might as well do the job properly, right?
Comment by Doug Newgard (Scimmia) - Saturday, 24 January 2015, 04:49 GMT
Arch Developers have suggested Calibre be removed previously because of upstream's attitude. That would be one solution to be sure.

I changed this to a general gripe because that's what it is, it's not about a bug, it's a general complaint. As far as low severity, I refer you to the guidelines: https://wiki.archlinux.org/index.php/Reporting_bug_guidelines#Severity

The person having problems converting was because an AUR package was overriding six.py, doesn't have anything to do with this. Don't go straw man here.
Comment by Jelle van der Waa (jelly) - Saturday, 24 January 2015, 15:18 GMT
Eli Schwartz,

We have a few possibilities to fix the current situation:
a) move calibre to /opt, somehow including all the patched libraries
Have to research the implications of this solution.
b) patch the libraries in the repos (I maintain three of them already).
We only patch packages when the fix has been upstreamed and if it's really critical

c) Convince the dev to upstream the patches.
I don't have time to bug the developer at the moment. I could perhaps help him upstream it

d) drop it to aur.
Since a lot of people use calibre I am not in favour of this solution

So I am at the moment undecided on the solution for this bug
Comment by Kovid Goyal (kovidgoyal) - Thursday, 05 March 2015, 04:33 GMT
@Scimmia: I am upstream for calibre, and I find the claim that I am "Windows developer" utterly ridiculous. I have been using Linux/Unix as my primary OS for probably longer than you have been alive. And I find the attitude of distro maintainers like you that believe they can maintain thousands of private patches against upstream software without breaking things, in perpetuity, utterly ridiculous. Look around you. *Every* single *successful* *end-user* OS platform has an application distribution model that includes bundled/modified dependencies. That is because they long ago realized that having a small number of people maintain the exponential complexity graph of modern software collections to be utterly impractical. Choose a base system of kernel + a small number of system libraries/utilities. Modify those to your hearts content. Do not pretend that you can successfully maintain a customized version of the entire software stack of a modern desktop. There are many reasons Linux has not succeeded on the desktop and Linux distros are one of the primary ones. Simply *do not* break working software to conform with your ideologies. Either choose not to ship it, or ship it unmodified or with only *minimal* patches required for system integration.

@jelly: If you want to get the changes to html5lib upstreamed, feel free to do the work:https://github.com/html5lib/html5lib-python/issues/119
Even after you do all of it, you will find that you will still need to ship the modified version, since html5lib upstream (quite correctly) does not want the modified lxml writer I developed for calibre.
Comment by Eli Schwartz (eschwartz) - Thursday, 05 March 2015, 09:33 GMT
@Scimmia
"The person having problems converting was because an AUR package was overriding six.py, doesn't have anything to do with this. Don't go straw man here."

In fact, I was actually aware of that Arch bugreport, but was fully capable of figuring out on my own, or perhaps because of the bug resolution, that that was not a *calibre* issue. ;)

The person who was unable to convert reported the matter on MobileRead, as linked in my original report. In that case, it was a silent fail, no importerror. So I am guessing it wasn't the same exact problem (such an obscure package is not likely to be installed on his system as well) and since html5lib is I believe used extensively by the conversion process it seems a much more likely cause.
Since html5lib specifically, is a de facto fork designed to do things upstream doesn't want or need, it seems logical to me that stripping out all the changes meant specifically to do what calibre needs to do for conversion, is going to mean things will break either subtly or obviously. Regadless of whether that particular user's error was a result of html5lib or anything else, Arch is still today shipping a package that does not do what the author intended!

Since it turns out I am not complaining about problems caused by other packages, does that mean this is not longer just a gripe?

@jelly,

a) Sounds like an awesome compromise. Just change the install prefix. Everyone will know that calibre is being naughty and was sent to its room for punishment.
b) I agree that is a bad idea.
c) Reasonable enough. But as kovidgoyal said, good luck with the html5lib people, since the calibre isn't really meant to do the same things. And if that is the path you choose, can we have an interim solution?
d) If the package will not be fixed in the repos, I think people would prefer to have a properly working AUR package over a broken repo package. Since Arch Linux is all about not modifying/"fixing" things downstream and all, I am finding it very hard to understand why an Arch repo package would be doing just that.
Comment by Doug Newgard (Scimmia) - Thursday, 05 March 2015, 21:59 GMT
Don't you love it when people insult you, tell you how great Linux is, then tell you why they ignore the basic tenets that a GNU/Linux system is built on?

Anyway, it seems we're dealing with a true, incompatible fork of html5lib. Keeping the same namespace makes things nice and messy (Windows style messy), but what can you do?
Comment by Kovid Goyal (kovidgoyal) - Friday, 06 March 2015, 02:00 GMT
ROFL. If you dont know how to handle installing multiple versions of a python package, I suggest you stop maintaining python packages and leave it to someone with some actual knowledge. Or run a calibre source install and see how I do it.

Go read up on side by side assemblies in windows sometime, before shooting your mouth off about stuff you dont understand. Modern windows handles multiple, potentially incompatible copies of the same dependency on the same system just as well as modern linux. If you want to criticize windows, criticize it for something it actually gets wrong, or you just come off as a typical linux fanboy.

Just because something was a "basic tenet" of GNU/Linux twenty years ago, does not mean it is true any longer. The world is changing, I suggest you change with it, or be left behind. And I'm not the only one that thinks so, see for example, http://0pointer.net/blog/revisiting-how-we-put-together-linux-systems.html or listen to Linus talk about the issues he had packaging his divelog software for linux.

Finally if you dont want to be insulted, I suggest you stop being insulting.
Comment by Eli Schwartz (eschwartz) - Friday, 12 June 2015, 04:45 GMT
Another bug for you: http://www.mobileread.com/forums/showthread.php?t=261649

tl;dr
spellchecker in ebook-edit is totally failing to load dictionaries. Inline spellchecker seems to be unaffected.
Seems to be html5lib again.


Perhaps one of these days we can be upgraded from a "General Gripe"?
Comment by Eli Schwartz (eschwartz) - Friday, 23 October 2015, 01:19 GMT
Perhaps we can at least get the *incompatible forked offshoot* of html5lib back?


As for cherrypy, I was able to track down the following upstream bugs:
https://bitbucket.org/cherrypy/cherrypy/issues/1017/exit-behavior-is-not-good-when-running-in
https://bitbucket.org/cherrypy/cherrypy/issues/297/bugs-inside-httpauthpy-rfc2617 -- (#5)
https://bitbucket.org/cherrypy/cherrypy/issues/985/patch-case-issue-for-the-string-md5 -- ($5 of previous)

Perhaps you could lean on upstream cherrypy to fix those 5-year-old bugs, and maybe that would be enough to get calibre's patched copy removed in calibre upstream.
Those do appear to be the critical fixes, anyway. :)


Attached is a diff of calibre's changes to the upstream source

P.S. Due to Bitbucket's anonymous reporting, I have no idea whether Kovid Goyal was ever involved with upstream on those issues.
Although I have seen Philantrop in the calibre forums on MobileRead.
Comment by Kovid Goyal (kovidgoyal) - Friday, 23 October 2015, 02:16 GMT
FYI, the new calibre server does not use cherrypy, so calibre 3.0 will be dropping cherrypy entirely.
Comment by Stefan Husmann (stefanhusmann) - Thursday, 19 November 2015, 00:08 GMT
Perhaps we drop could cherrypy from our repos to AUR , it is only needed by calibre (which would obviously live better without it) and as a check dependency of only one other package.
Comment by Jelle van der Waa (jelly) - Thursday, 19 November 2015, 20:46 GMT
I don't see why cherrypy needs to be dropped it's in perfect working order and people use it.
Comment by Ivan (kaptoxic) - Tuesday, 19 April 2016, 07:16 GMT
After the recent update, I am having problems downloading metadata (similar to those explained here: http://www.mobileread.com/forums/showthread.php?t=270544 and https://bugs.archlinux.org/task/48368).
Why don't just move some of patched libraries to /opt?
Comment by Jelle van der Waa (jelly) - Saturday, 30 April 2016, 15:15 GMT
It would be nice if Kovid could explain the changes between html5lib of calibre and html5lib upstream.. Anyway this is how I see it.

https://github.com/gsnedders/html5lib-python/commit/49f37d2724b117b1eaa1ba7a23d27aef14db6932
* Moves function around, never uses it anywhere new.
* The return None is new, no clue what it's supposed to fix though....

https://github.com/gsnedders/html5lib-python/commit/7702d80e52c1d1c90ea41af6808623c78b518f01
* Speed improvements, can look into moving this to html5lib, but doesn't look shocking.. (no numbers to back anything up)

https://github.com/gsnedders/html5lib-python/commit/a2d2e05fb667683db9e020e7a2629e62fdc95832
* Again speedup, not important.

https://github.com/gsnedders/html5lib-python/commit/124e97565affa33dd36fd29c874020c72cce29fb
* Again speedup.

https://github.com/gsnedders/html5lib-python/commit/1576515fcd62af2969e387c92846790ec177a6db
* Track position support

Actually used...
src/calibre/ebooks/oeb/polish/parsing.py: parser = HTMLParser(tree=builder, track_positions=line_numbers, namespaceHTMLElements=not discard_namespaces)
src/html5lib/html5parser.py: strict=False, namespaceHTMLElements=True, debug=False, track_positions=False):

Invasive changeset, requires at least tests and some cleanups to be upstreamed if they want it..

https://github.com/gsnedders/html5lib-python/commit/64e8b0bd8e9bc4dbb8339ab0e094ef933035339c
https://github.com/gsnedders/html5lib-python/commit/cc9f28af4859663738c67971f9893e2558f86138

* Can't see what this adds, except moving a function somewhere else. Doesn't seem to improve the function

https://github.com/gsnedders/html5lib-python/commit/db0b5a02a2cb56d0410b878005ca5fdfa91961b2

* Preserve attribute order when parsing. Big change, seems useful for upstream.

--https://github.com/kovidgoyal/calibre/commit/c68b9b7d64ab08ae78bf461b0e4683ce54222aa2--

* Bugfix, seems that LXML can't handle --.
* Fixed upstream already!

https://github.com/kovidgoyal/calibre/commit/a82ff8b7493ee5ee4f9d8a4fd4e56ac2c1df4267

* Bugfix..
Comment by Kovid Goyal (kovidgoyal) - Saturday, 30 April 2016, 16:19 GMT
49f37d2724b117b1eaa1ba7a23d27aef14db6932 - Does not move a function around, moves some functionality into a new function, so that it can be overridden in sub-clasees. grep the function name in the calibre source code to see where it is overridden.

7702d80e52c1d1c90ea41af6808623c78b518f01 - these are trivially obvious -- if x vs. if x is None is the difference between a pointer comparison and a function call.

a2d2e05fb667683db9e020e7a2629e62fdc95832 - this is important it allows a html5lib consumer to provide its own, faster stream class, which the lxml builder in calibre uses.

64e8b0bd8e9bc4dbb8339ab0e094ef933035339c - same as 49f37d2724b117b1eaa1ba7a23d27aef14db6932

Feel free to ask if you have more questions.
Comment by Jelle van der Waa (jelly) - Sunday, 15 May 2016, 18:25 GMT
Just saw that upstream took a few of the patches here https://github.com/html5lib/html5lib-python/pull/245
Comment by Eli Schwartz (eschwartz) - Friday, 23 June 2017, 21:21 GMT
As of calibre v3, none of these forked and modified libraries are stripped from calibre's source tree. Largely because html5lib is the only one that calibre still uses, and that has API changes compared to the version that calibre forked.

Even that will go away at some point likely, as upstream intends to eventually migrate to https://github.com/kovidgoyal/html5-parser which is implemented as a binary extension.

...

Anyway, point is that the objective of this bugreport has been achieved.

Loading...