Skip to content

cropping images leads to poorer results #2007

@mislav-zane

Description

@mislav-zane

Bug description

In order to save memory I am cropping images from shape (1080, 1920, 3) to shape (1080, 1080, 3).

In the original image docTR is able to find all the numbers that I am looking for.
In the same but cropped image docTR does not find all numbers.

I would expect the results to be the same.
Is there a reason for the results I am getting?

Code snippet to reproduce the bug

from pathlib import Path
import cv2
from doctr.io import DocumentFile
from doctr.models import ocr_predictor


# initialize OCR predictor
predictor = ocr_predictor(
    det_arch="db_resnet34",
    reco_arch="crnn_vgg16_bn",
    pretrained=True,
    assume_straight_pages=False,
    straighten_pages=True,
)

for i in range(1, 5):
    print(f"\nProcessing image #{i}:")

    for crop in [False, True]:
        if crop:
            preprocessed_image_filename = f"image_{i}__cropped.png"
        else:
            preprocessed_image_filename = f"image_{i}__original.png"
        preprocessed_image_path = Path(__file__).parent / "images" / preprocessed_image_filename

        # Load the preprocessed image
        preprocessed = cv2.imread(str(preprocessed_image_path))

        # Load OCR predictor with requested models
        _, buf = cv2.imencode(".png", preprocessed)
        doc = DocumentFile.from_images(buf.tobytes())

        # Run OCR
        result = predictor(doc)

        # Print plain text result
        output_text = ""
        for page in result.pages:
            for block in page.blocks:
                for line in block.lines:
                    output_text += " ".join(w.value + " " for w in line.words)
        print(f"OCR Result for {preprocessed_image_filename} (Crop: {crop}):\n{output_text}\n")

images.zip

Error traceback

running the script gives the following output:

Processing image #1:
OCR Result for image_1__original.png (Crop: False):
490374 303  2 - 

OCR Result for image_1__cropped.png (Crop: True):
- -  - -  -  - - 49037 -  - -  -  - ( - 0. 2 I à  - - à  - . r . - - - a à -  - - 


Processing image #2:
OCR Result for image_2__original.png (Crop: False):
489556 5022 - - à 1   

OCR Result for image_2__cropped.png (Crop: True):
. 2 I a - - . -  -  - 5022  - - - - - -  - - I 89556 


Processing image #3:
OCR Result for image_3__original.png (Crop: False):
489556 303  S  2 A 

OCR Result for image_3__cropped.png (Crop: True):
. I - ( 2  G - - - - - I -  -  -  -  - 56 1 


Processing image #4:
OCR Result for image_4__original.png (Crop: False):
-  5 489556 3012 à 

OCR Result for image_4__cropped.png (Crop: True):
A 489556 a  à  -  -  -  -  - I - I - - - - à I I 

Environment

Collecting environment information...

DocTR version: v1.0.0
PyTorch version: 2.7.1+cu126 (torchvision 0.22.1+cu126)
OpenCV version: 4.11.0
OS: Ubuntu 24.04.2 LTS
Python version: 3.12.3
Is CUDA available (PyTorch): No
CUDA runtime version: Could not collect
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions