Skip to content

combine_tessdata Fails to Create Temporary File micr.traineddata.__tmp__ During Training #4395

Open
@jo-walker

Description

@jo-walker

Current Behavior

During MICR font training for check validation, the combine_tessdata.exe step fails with the following error:
Failed to create a temporary file micr.traineddata.tmp

Here’s the relevant log snippet from my script:
2025-03-05 19:52:23,933 - ERROR - combine_tessdata failed: Failed to create a temporary file micr.traineddata.tmp
2025-03-05 19:52:23,935 - ERROR - MICR font training failed: Command '['C:\Program Files\Tesseract-OCR\combine_tessdata.exe', '-o', 'micr.traineddata', 'training.unicharset', 'training.inttemp', 'training.pffmtable', 'training.shapetable', 'training.normproto']' returned non-zero exit status 1.
2025-03-05 19:52:23,937 - ERROR - Processing failed: Training failed: Command '['C:\Program Files\Tesseract-OCR\combine_tessdata.exe', '-o', 'micr.traineddata', 'training.unicharset', 'training.inttemp', 'training.pffmtable', 'training.shapetable', 'training.normproto']' returned non-zero exit status 1.

The training process halts at this step, and the micr.traineddata file is not generated.

The script sets the TMP and TEMP environment variables to the training directory before running combine_tessdata.exe, but the error persists.
Manual execution of the combine_tessdata command in the training directory also fails with the same error:
C:\Users\jotam\projects\ocr-trans\check_validation\training_model\training>"C:\Program Files\Tesseract-OCR\combine_tessdata.exe" -o micr.traineddata training.unicharset training.inttemp training.pffmtable training.shapetable training.normproto
Failed to create a temporary file micr.traineddata.tmp

the full log from my script on GitHub:
https://github.com/jo-walker/ocr-reader/blob/main/check_validation/check_processing.log#L1314

Expected Behavior

The training should complete successfully, generating the micr.traineddata file in the specified training directory.

Suggested Fix

No response

tesseract -v

tesseract v5.5.0.20241111
leptonica-1.85.0
libgif 5.2.2 : libjpeg 8d (libjpeg-turbo 3.0.4) : libpng 1.6.44 : libtiff 4.7.0 : zlib 1.3.1 : libwebp 1.4.0 : libopenjp2 2.5.2
Found AVX2
Found AVX
Found FMA
Found SSE4.1
Found libarchive 3.7.7 zlib/1.3.1 liblzma/5.6.3 bz2lib/1.0.8 liblz4/1.10.0 libzstd/1.5.6
Found libcurl/8.11.0 Schannel zlib/1.3.1 brotli/1.1.0 zstd/1.5.6 libidn2/2.3.7 libpsl/0.21.5 libssh2/1.11.0

Operating System

Windows 11

Other Operating System

No response

uname -a

No response

Compiler

No response

CPU

No response

Virtualization / Containers

No response

Other Information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions