FS#71856 - [leptonica] 1.81 breaks (some) monochrome TIFF->PDF conversion

Attached to Project: Community Packages
Opened by Alexander Kobel (akobel) - Tuesday, 17 August 2021, 16:16 GMT
Last edited by Toolybird (Toolybird) - Sunday, 14 May 2023, 01:27 GMT
Task Type Bug Report
Category Packages
Status Closed
Assigned To Jelle van der Waa (jelly)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Leptonica 1.81 breaks (among others) TIFF to PDF conversion, see https://github.com/DanBloomberg/leptonica/issues/586 for a discussion of the issue including fully automated tests.

A prominent caller of this function is the widely used tesseract OCR and several document scanning and post-processing tools that use tesseract internally.

Reverting commit 2881dfb049aea0821b506e5a5ed0048eef749c04 from upstream fixes the issue.
This was a commit introduced to 1.81.0, meant as a pure performance optimization for embedding CCITT Group4-compressed monochrome TIFF images; but it does only work for a subset of valid files (essentially, ones created by unwrapping them from PDF). The performance gain is not tremendous in many cases (e.g., negligible compared to OCR or non-trivial image processing), and certainly not worth risking correctness.
It would be nice if this hotfix could be applied as an interim measure until a long-term solution is found upstream.

On a side note, leptonica is currently built without openjpeg2 dependency and, hence, without JPEG2k support; it'd be nice if this could be included.
Attached is a corresponding PKGBUILD that covers both points.
   PKGBUILD (1.1 KiB)
This task depends upon

Closed by  Toolybird (Toolybird)
Sunday, 14 May 2023, 01:27 GMT
Reason for closing:  Fixed
Additional comments about closing:  We're on leptonica 1.83.1-1 now so assuming fixed.
Comment by Alexander Kobel (akobel) - Saturday, 21 August 2021, 14:57 GMT
Reverting the single commit as a workaround is acked by leptonica's maintainer, see https://github.com/DanBloomberg/leptonica/issues/586#issuecomment-903123931
The final upstream fix (slightly more elaborate, and slightly more performant under certain condition) will probably only be in 1.82.0 to be released soon-ish.

Loading...