FS#73586 - [tesseract] data use legacy models rather than recent ones
Attached to Project:
Community Packages
Opened by Tomas Mudrunka (harvie) - Tuesday, 01 February 2022, 11:54 GMT
Last edited by Caleb Maclennan (alerque) - Wednesday, 05 April 2023, 09:56 GMT
Opened by Tomas Mudrunka (harvie) - Tuesday, 01 February 2022, 11:54 GMT
Last edited by Caleb Maclennan (alerque) - Wednesday, 05 April 2023, 09:56 GMT
|
Details
Description: According to:
https://tesseract-ocr.github.io/tessdoc/Data-Files it seems that tesseract-data-* packages actualy contain some old "legacy" models rather than the latest "fast" or "best" which seem to be more efficient. Additional info: * package version(s) tesseract-data-* 2:4.1.0-3 |
This task depends upon
Closed by Caleb Maclennan (alerque)
Wednesday, 05 April 2023, 09:56 GMT
Reason for closing: Not a bug
Wednesday, 05 April 2023, 09:56 GMT
Reason for closing: Not a bug
There are three variants: tessdata, tessdata-fast, and tessdata-best. We are using the first one, which is a trade-off between speed and accuracy. The "best" models are more accurate but slower, the "fast" models are faster but less accurate. Our choice to use the compromise between speed and accuracy has nothing to do with it also having legacy support for older tesseract and there is not a clear winner for which one to pick of the other two.