Skip to content

Tesseract struggles with 90-degree angled text sometimes #4387

Open
@0dinD

Description

@0dinD

Current Behavior

I was investigating whether Tesseract can handle mixed orientation in the text (see also: #2055), and found a specific case where it almost works, but fails in a way that makes me think there's a bug in the code. More specifically, in the example that I provide below, Tesseract seems to be reading the 90-degree text "upside-down", as in, reading the 90-degree text as if though it was 270-degree text.

For example, as you can see in the output hOCR below, the textangle is correctly identified as 90 degrees, but Tesseract is reading the text "upside-down", i.e. from a 270 degree perspective. Look at words like "anbeu" ("neque" but upside-down), "luenb" ("quam" but upside-down), "wesdi" ("ipsum" but upside-down) and so on.

Command used: tesseract text-90deg.png text-90deg --psm 1 hocr

Input image:

Image

Output hOCR:

text-90deg.hocr.txt

Tested with the current latest AppImage of Tesseract, 5.5.0

Expected Behavior

Tesseract should read all the text in the correct orientation so that there are no jumbled words in the hOCR output.

Suggested Fix

Find and fix the bug that makes Tesseract read 90-degree text as 270-degree text in this case.

tesseract -v

tesseract 5.5.0
 leptonica-1.79.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1
 Found AVX512BW
 Found AVX512F
 Found AVX512VNNI
 Found AVX2
 Found AVX
 Found FMA
 Found SSE4.1
 Found OpenMP 201511
 Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 liblz4/1.9.2 libzstd/1.4.4

Operating System

Ubuntu 22.04 Jammy

Other Operating System

No response

uname -a

No response

Compiler

No response

CPU

No response

Virtualization / Containers

No response

Other Information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    OSDOrientation and Script Detectionbug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions