FS#65676 - [tesseract] should depend on tesseract-data-eng
Attached to Project:
Community Packages
Opened by carlenny (carlenny) - Monday, 02 March 2020, 02:16 GMT
Last edited by Jelle van der Waa (jelly) - Saturday, 06 June 2020, 18:06 GMT
Opened by carlenny (carlenny) - Monday, 02 March 2020, 02:16 GMT
Last edited by Jelle van der Waa (jelly) - Saturday, 06 June 2020, 18:06 GMT
|
Details
tesseract-data-eng should be a (non-optional) dependency of
tesseract.
Steps to reproduce: $ pacman -Q | grep tesseract tesseract 4.1.1-1 tesseract-data-deu 1:4.0.0-1 $ ocrmypdf -l deu --output-type pdf --skip-text input.pdf output.pdf ERROR - Tesseract failed to report available languages. Output from Tesseract: ----------- Error opening data file /usr/share/tessdata/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language 'eng' Tesseract couldn't load any languages! List of available languages (2): deu osd IMHO this is not an upstream bug because tesseracts [issue guidelines](https://github.com/tesseract-ocr/tesseract/blob/master/CONTRIBUTING.md) say: > Each version of Tesseract has its own language data you need to obtain. You must obtain and install trained data for English (eng) and osd. Verify that Tesseract knows about these two files (and other trained data you installed) with this command: tesseract --list-langs. |
This task depends upon
Closed by Jelle van der Waa (jelly)
Saturday, 06 June 2020, 18:06 GMT
Reason for closing: Fixed
Additional comments about closing: Works as intended, check the optional dependencies
Saturday, 06 June 2020, 18:06 GMT
Reason for closing: Fixed
Additional comments about closing: Works as intended, check the optional dependencies
Comment by
Jelle van der Waa (jelly) -
Saturday, 06 June 2020, 18:06 GMT
That doesn't scale however, since someone might need a different
language and the package has an optional dependency on it.