Description
Current Behavior
i m using tesseract 5.3.0 on a docker environement (Debian/php/mysql)
i use png images processed with a 600 DPI resolution.
when i use -l fra option on tesseract, 7 are very often misread as / or /7 or 7/
there are less misread if i don't use -l fra option.
But if i use --psm 4 then the misread 7 on / are also frequent. If i use --psm 6 option it's the 5 that are misread in 9 frequently. I looked at my png and there are no ways for those 7 to be misread in / or those 5 to be misread on 9.
I tried to use --dpi 600 or --dpi 300 and there are no improvement.
Thank you in advance.
Pierre
Expected Behavior
i expect all the digits of my documents to be read in a good way
Suggested Fix
i d k
tesseract -v
tesseract 5.3.0
leptonica-1.82.0
libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.1.2) : libpng 1.6.39 : libtiff 4.5.0 : zlib 1.2.13 : libwebp
1.2.4 : libopenjp2 2.5.0
Found AVX
Found SSE4.1
Found OpenMP 201511
Found libarchive 3.6.2 zlib/1.2.13 liblzma/5.4.1 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.4
Found libcurl/7.88.1 OpenSSL/3.0.15 zlib/1.2.13 brotli/1.0.9 zstd/1.5.4 libidn2/2.3.3 libpsl/0.21.2 (+li
bidn2/2.3.3) libssh2/1.10.0 nghttp2/1.52.0 librtmp/2.3 OpenLDAP/2.5.13
Operating System
debian 12 / docker
Other Operating System
developpment platform is MACOS
uname -a
No response
Compiler
no compiler
CPU
No response
Virtualization / Containers
docker
Other Information
No response