FS#68899 - [poppler] poppler-data shouldn't be optional (or maybe split)

Attached to Project: Arch Linux
Opened by drws (drws) - Tuesday, 08 December 2020, 20:17 GMT
Last edited by Andreas Radke (AndyRTR) - Wednesday, 23 December 2020, 08:21 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Andreas Radke (AndyRTR)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

The package poppler-data is described as "Encoding data for the poppler PDF rendering library". Currently the main package poppler considers it optional while describing it as "encoding data to display PDF documents containing CJK characters", which is unfortunately not the whole story. A practical example would be the following file, which doesn't render completely without poppler-data (with partially missing text in the first page):

https://www.espressif.com/sites/default/files/1a-esp32_pin_list_en-v0.1.pdf

Since poppler-data includes not only the CJK encoding data, it could be made required by poppler or even split into requred poppler-data and optional poppler-data-cjk or something like that.
This task depends upon

Closed by  Andreas Radke (AndyRTR)
Wednesday, 23 December 2020, 08:21 GMT
Reason for closing:  Fixed
Additional comments about closing:  pushed more clear optdepends suggestion to svn trunk for future builds.
Comment by Andreas Radke (AndyRTR) - Tuesday, 08 December 2020, 21:16 GMT
While Fedora adds a runtime dependency on poppler-data that is in my view not technically the proper way per upstream recommendation.
Debian based distributions "recommend"(we call it optdepend) poppler-data for enconding.

I've confirmed your pdf file to fail to render without poppler-data and its not including anything CJK and Cyrillic related characters on that 1st page it seems.

I suggest you file an upstream bug to the poppler main tracker at https://gitlab.freedesktop.org/poppler/poppler/-/issues and ask if this is a valid poppler rendering bug or if the https://gitlab.freedesktop.org/poppler/poppler-data/-/blob/master/README is misleading about CJK/CYRILLIC.
Comment by drws (drws) - Friday, 18 December 2020, 20:47 GMT
While it should also be cleared upstream, the difference between poppler-data description and optdependency description in poppler package remains. The latter one unnecessarily complicates things by emphasizing CJK in comparison to more general description in poppler-data (which could be the accurate one given the package's file list: https://archlinux.org/packages/extra/any/poppler-data/files/ ).
Comment by Andreas Radke (AndyRTR) - Tuesday, 22 December 2020, 10:44 GMT Comment by Andreas Radke (AndyRTR) - Wednesday, 23 December 2020, 08:21 GMT
Your pdf file is Korean encoding. I've committed a better description to svn trunk for future builds.

Loading...