FS#52602 - [python-bleach] Not compatible with python-html5lib-0.999999999-2 from repo

Attached to Project: Community Packages
Opened by Sebastian Pinnau (spinnau) - Tuesday, 17 January 2017, 17:39 GMT
Last edited by Johannes Löthberg (demize) - Saturday, 21 January 2017, 19:21 GMT
Task Type Bug Report
Category Packages
Status Closed
Assigned To Johannes Löthberg (demize)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

There are compatibility issues between the package and its dependency python-html5lib from the community repo.

* Versions:
- python-bleach 1.5.0-3
- python-html5lib 0.999999999-2 (9 9's)


Importing the package in Python raises the following error:

>>> import bleach
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.6/site-packages/bleach/__init__.py", line 14, in <module>
from html5lib.sanitizer import HTMLSanitizer
ModuleNotFoundError: No module named 'html5lib.sanitizer'



According to the changelog ( https://bleach.readthedocs.io/en/latest/changes.html ) bleach is only compatible to html5lib <0.99999999 (8 9's).

This task depends upon

Closed by  Johannes Löthberg (demize)
Saturday, 21 January 2017, 19:21 GMT
Reason for closing:  Fixed
Additional comments about closing:  Using vendored copy for now until upstream is updated to account for html5lib breaking the api
Comment by Johannes Löthberg (demize) - Wednesday, 18 January 2017, 12:53 GMT
Ah, crap, really sorry about that, seems I forgot to test it on a clean system. I wonder what the best way to solve this currently would be. Would prefer not having to move it back to the AUR, but it seems like that's sadly necessary for now, hmm.
Comment by Johannes Löthberg (demize) - Wednesday, 18 January 2017, 13:14 GMT
Oh, I think part of the reason I failed to realize this is that python-html5lib-git actually builds from a more than 3 versions old commit, hmph.
Comment by Sebastian Pinnau (spinnau) - Wednesday, 18 January 2017, 15:13 GMT
Your work to get this package from AUR to community repo is really appreciated. I also think it should be available here, to make jupyter-nbconvert work. Downgrading the python-html5lib community package is probably not an option, since other packages depend on it.


A possible temporary solution is to bundle the older html5lib-0.9999999 (7 9's) directly in the python-bleach package, until html5lib-1.0 with stable API is released and bleach adapted it ( bleach issue: https://github.com/mozilla/bleach/issues/229 ).

For bundling some changes has to be done:

* install html5lib in site-packages/bleach
* add local html5 package folder to search path in bleach
* remove html5 dependency from bleach (to avoid pkg_resources errors)


I have done this in an updated PKGBUILD. Tested and working with Python-3.6. The bleach library can be imported and also "jupyter nbconvert" works (with an also tweaked python-entrypoints package with added egg-info file). What do you think about it?


# Maintainer: Johannes Löthberg <johannes @ kyriasis.com>

pkgbase=python-bleach
pkgname=(python-bleach python2-bleach)
pkgver=1.5.0
pkgrel=3

pkgdesc="An easy whitelist-based HTML-sanitizing tool"
url="http://pypi.python.org/pypi/bleach"
arch=('any')
license=('Apache')

makedepends=('python-setuptools'
'python2-setuptools')

source=("python-bleach-$pkgver.tar.gz::https://pypi.org/packages/source/b/bleach/bleach-$pkgver.tar.gz"
"https://github.com/html5lib/html5lib-python/archive/0.9999999.tar.gz")

md5sums=('b663300efdf421b3b727b19d7be9c7e7' '2ca78b1ec5852779bc121a97da6e8d4d')

prepare() {
cp -a bleach-$pkgver{,-python2}
}

build() {
cd "$srcdir"/bleach-$pkgver
# add local html5lib to search path
sed -i 's/import re/import re\nimport os, sys\nsys.path.insert(0, os.path.dirname(__file__))/' ./bleach/__init__.py
# remove html5 dependency
sed -i '/html5lib/d' ./setup.py
python setup.py build

cd "$srcdir"/bleach-$pkgver-python2
# add local html5lib to search path
sed -i 's/import re/import re\nimport os, sys\nsys.path.insert(0, os.path.dirname(__file__))/' ./bleach/__init__.py
# remove html5 dependency
sed -i '/html5lib/d' ./setup.py
python2 setup.py build
}

package_python-bleach() {
cd "$srcdir"/html5lib-python-0.9999999
python setup.py install --root="$pkgdir"/usr/lib/python3.6/site-packages --install-purelib=bleach --optimize=1

cd "$srcdir"/bleach-$pkgver
python setup.py install --root="$pkgdir" --optimize=1
}

package_python2-bleach() {
cd "$srcdir"/html5lib-python-0.9999999
python2 setup.py install --root="$pkgdir"/usr/lib/python2.7/site-packages --install-purelib=bleach --optimize=1

cd "$srcdir"/bleach-$pkgver-python2
python2 setup.py install --root="$pkgdir" --optimize=1
}
Comment by Phil Schaf (flying-sheep) - Thursday, 19 January 2017, 09:20 GMT
for a quick workaround, i’ve also created https://aur.archlinux.org/packages/python-html5lib-9x07/
Comment by Johannes Löthberg (demize) - Friday, 20 January 2017, 14:29 GMT
Okay, so, I've now pushed python{,2}-html5lib-7-9s to [community]. Please tell me if there are any issues with it, but for now I'll close this issue since it seems to be working with that version.
Comment by Sebastian Pinnau (spinnau) - Friday, 20 January 2017, 16:31 GMT
  • Field changed: Percent Complete (100% → 0%)
Thank for the solution, but it's not working yet. The PKGBUILD for the python{,2}-html5lib-7-9s package is missing the lines:

provides=('python-html5lib')

and

provides=('python2-html5lib')
Comment by Phil Schaf (flying-sheep) - Saturday, 21 January 2017, 11:00 GMT
also this means we’re stuck on an old html5lib version. as said, it’s a workaround no solution.

the solution would be what sebastian said:

1. put a matching version of the module to a non-$PYTHONPATH location (e.g. “…/site-packages/bleach/vendor/”)
2. patch bleach to use it (add “…/vendor” to sys.path or so)
2. patch out html5lib from “…/site-packages/bleach-1.5.0-py3.6.egg-info/requires.txt” so pkg_resources doesn’t complain

i’d however write a real patch instead of using sed, then we aren’t incentivised to use shortcuts like adding “…/site-packages/bleach” to sys.path, as this adds all submodules to pythonpath instead of just html5lib. instead:

here = os.path.dirname(__file__)
sys.path.insert(0, os.path.join(here, 'vendor'))
Comment by Johannes Löthberg (demize) - Saturday, 21 January 2017, 19:16 GMT
That isn't really a solution either, it's just another workaround. Anyway, at first it didn't work for me, but just realized what I did wrong, and just pushed a version with the vendored copy can use the newer version.

Loading...