-
Notifications
You must be signed in to change notification settings - Fork 602
Description
Bug description
In order to save memory I am cropping images from shape (1080, 1920, 3) to shape (1080, 1080, 3).
In the original image docTR is able to find all the numbers that I am looking for.
In the same but cropped image docTR does not find all numbers.
I would expect the results to be the same.
Is there a reason for the results I am getting?
Code snippet to reproduce the bug
from pathlib import Path
import cv2
from doctr.io import DocumentFile
from doctr.models import ocr_predictor
# initialize OCR predictor
predictor = ocr_predictor(
det_arch="db_resnet34",
reco_arch="crnn_vgg16_bn",
pretrained=True,
assume_straight_pages=False,
straighten_pages=True,
)
for i in range(1, 5):
print(f"\nProcessing image #{i}:")
for crop in [False, True]:
if crop:
preprocessed_image_filename = f"image_{i}__cropped.png"
else:
preprocessed_image_filename = f"image_{i}__original.png"
preprocessed_image_path = Path(__file__).parent / "images" / preprocessed_image_filename
# Load the preprocessed image
preprocessed = cv2.imread(str(preprocessed_image_path))
# Load OCR predictor with requested models
_, buf = cv2.imencode(".png", preprocessed)
doc = DocumentFile.from_images(buf.tobytes())
# Run OCR
result = predictor(doc)
# Print plain text result
output_text = ""
for page in result.pages:
for block in page.blocks:
for line in block.lines:
output_text += " ".join(w.value + " " for w in line.words)
print(f"OCR Result for {preprocessed_image_filename} (Crop: {crop}):\n{output_text}\n")Error traceback
running the script gives the following output:
Processing image #1:
OCR Result for image_1__original.png (Crop: False):
490374 303 2 -
OCR Result for image_1__cropped.png (Crop: True):
- - - - - - - 49037 - - - - - ( - 0. 2 I à - - à - . r . - - - a à - - -
Processing image #2:
OCR Result for image_2__original.png (Crop: False):
489556 5022 - - à 1
OCR Result for image_2__cropped.png (Crop: True):
. 2 I a - - . - - - 5022 - - - - - - - - I 89556
Processing image #3:
OCR Result for image_3__original.png (Crop: False):
489556 303 S 2 A
OCR Result for image_3__cropped.png (Crop: True):
. I - ( 2 G - - - - - I - - - - - 56 1
Processing image #4:
OCR Result for image_4__original.png (Crop: False):
- 5 489556 3012 à
OCR Result for image_4__cropped.png (Crop: True):
A 489556 a à - - - - - I - I - - - - à I I
Environment
Collecting environment information...
DocTR version: v1.0.0
PyTorch version: 2.7.1+cu126 (torchvision 0.22.1+cu126)
OpenCV version: 4.11.0
OS: Ubuntu 24.04.2 LTS
Python version: 3.12.3
Is CUDA available (PyTorch): No
CUDA runtime version: Could not collect
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect