-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Description
Bug
When using the VLM pipeline (ibm-granite/granite-docling-258M via vLLM==0.11.2) with docling==2.64.0, the model occasionally predicts "flat" or marginal bounding boxes. When these normalized coordinates are converted to integer pixel coordinates by docling, they can result in a zero-height or zero-width crop box due to integer rounding.
This causes PIL to crash with SystemError: tile cannot extend outside image when ImageRef.from_pil attempts to save/process the crop.
Specific Edge Case Observed:
The VLM predicted: {'l': 0.398, 't': 0.998, 'r': 0.606, 'b': 0.998}.
Note that top and bottom are identical (0.998).
Even if a user manually adjusts bottom to 0.999 (a 0.001 difference), docling's internal logic converts these to pixels using int(). For a generic image height (e.g., 500px):
- Top:
int(0.998 * 500) = 499 - Bottom:
int(0.999 * 500) = 499
Result: top (499) == bottom (499). The crop height is 0 pixels, causing the crash.
Traceback:
File .../docling_core/types/doc/document.py:5670, in DoclingDocument.load_from_doctags
image=ImageRef.from_pil(image=cropped_image, dpi=72),
File .../docling_core/types/doc/document.py:1127, in ImageRef.from_pil
image.save(buffered, format="PNG")
... (PIL Internal calls) ...
File .../PIL/ImageFile.py:562, in _encode_tile
encoder.setimage(im.im, extents)
SystemError: tile cannot extend outside image
Steps to reproduce
-
Environment:
docling: 2.64.0vllm: 0.11.2- Model:
ibm-granite/granite-docling-258M
-
Reproduction Script (Simulating the logic):
Since the VLM output is non-deterministic, this script replicates the exact logic insideload_from_doctagsthat leads to the crash using the observed coordinates.
from PIL import Image
import io
# 1. Create a dummy image
im_width, im_height = 500, 500
image = Image.new("RGB", (im_width, im_height))
# 2. The problematic Bbox from VLM (Normalized)
# Top and Bottom are technically different floats, but very close.
bbox = {'l': 0.398, 't': 0.998, 'r': 0.606, 'b': 0.999}
# 3. Docling's internal logic (converting to pixels)
crop_box = (
int(bbox['l'] * im_width),
int(bbox['t'] * im_height),
int(bbox['r'] * im_width),
int(bbox['b'] * im_height),
)
print(f"Calculated Crop Box: {crop_box}")
# Output: (199, 499, 303, 499) -> Height is 0!
# 4. Trigger the crash
# The crash happens here, when encoding the 0-height image to PNG
try:
cropped = image.crop(crop_box)
# Docling tries to save it to memory to create a Base64 string
buf = io.BytesIO()
cropped.save(buf, format="PNG")
print("Success (This line will not be reached)")
except Exception as e:
print(f"\nCRASHED AS EXPECTED: {e}")Suggested Fix:
The BoundingBox processing logic (or docling_core) needs to enforce a minimum pixel thickness after the int() conversion and before image.crop(). Validating float epsilon is insufficient because int(x) rounding can still collapse the dimension to zero.
Docling version
Docling version: 2.64.0
Docling Core version: 2.51.1
Docling IBM Models version: 3.10.0
Docling Parse version: 4.7.1
Python: cpython-310 (3.10.11)
Platform: macOS-26.1-arm64-arm-64bit
Python version
Python 3.10.11