FS#73586 - [tesseract] data use legacy models rather than recent ones

Attached to Project: Community Packages
Opened by Tomas Mudrunka (harvie) - Tuesday, 01 February 2022, 11:54 GMT
Last edited by Caleb Maclennan (alerque) - Wednesday, 05 April 2023, 09:56 GMT
Task Type Bug Report
Category Packages
Status Closed
Assigned To Felix Yan (felixonmars)
Caleb Maclennan (alerque)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description: According to:
https://tesseract-ocr.github.io/tessdoc/Data-Files

it seems that tesseract-data-* packages actualy contain some old "legacy" models rather than the latest "fast" or "best" which seem to be more efficient.


Additional info:
* package version(s) tesseract-data-* 2:4.1.0-3
This task depends upon

Closed by  Caleb Maclennan (alerque)
Wednesday, 05 April 2023, 09:56 GMT
Reason for closing:  Not a bug
Comment by Caleb Maclennan (alerque) - Wednesday, 05 April 2023, 09:55 GMT
This is not correct. The page linked even says otherwise.

There are three variants: tessdata, tessdata-fast, and tessdata-best. We are using the first one, which is a trade-off between speed and accuracy. The "best" models are more accurate but slower, the "fast" models are faster but less accurate. Our choice to use the compromise between speed and accuracy has nothing to do with it also having legacy support for older tesseract and there is not a clear winner for which one to pick of the other two.

Loading...