Skip to content

misread seven or five depending on --psm value #4397

Open
@psurmont

Description

@psurmont

Current Behavior

i m using tesseract 5.3.0 on a docker environement (Debian/php/mysql)
i use png images processed with a 600 DPI resolution.
when i use -l fra option on tesseract, 7 are very often misread as / or /7 or 7/
there are less misread if i don't use -l fra option.
But if i use --psm 4 then the misread 7 on / are also frequent. If i use --psm 6 option it's the 5 that are misread in 9 frequently. I looked at my png and there are no ways for those 7 to be misread in / or those 5 to be misread on 9.
I tried to use --dpi 600 or --dpi 300 and there are no improvement.
Thank you in advance.

Pierre

Expected Behavior

i expect all the digits of my documents to be read in a good way

Suggested Fix

i d k

tesseract -v

tesseract 5.3.0
leptonica-1.82.0
libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.1.2) : libpng 1.6.39 : libtiff 4.5.0 : zlib 1.2.13 : libwebp
1.2.4 : libopenjp2 2.5.0
Found AVX
Found SSE4.1
Found OpenMP 201511
Found libarchive 3.6.2 zlib/1.2.13 liblzma/5.4.1 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.4
Found libcurl/7.88.1 OpenSSL/3.0.15 zlib/1.2.13 brotli/1.0.9 zstd/1.5.4 libidn2/2.3.3 libpsl/0.21.2 (+li
bidn2/2.3.3) libssh2/1.10.0 nghttp2/1.52.0 librtmp/2.3 OpenLDAP/2.5.13

Operating System

debian 12 / docker

Other Operating System

developpment platform is MACOS

uname -a

No response

Compiler

no compiler

CPU

No response

Virtualization / Containers

docker

Other Information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions