Skip to content

SystemError: tile cannot extend outside image caused by VLM zero-area predictions (Integer Rounding Edge Case) #2763

@martin-liu

Description

@martin-liu

Bug

When using the VLM pipeline (ibm-granite/granite-docling-258M via vLLM==0.11.2) with docling==2.64.0, the model occasionally predicts "flat" or marginal bounding boxes. When these normalized coordinates are converted to integer pixel coordinates by docling, they can result in a zero-height or zero-width crop box due to integer rounding.

This causes PIL to crash with SystemError: tile cannot extend outside image when ImageRef.from_pil attempts to save/process the crop.

Specific Edge Case Observed:
The VLM predicted: {'l': 0.398, 't': 0.998, 'r': 0.606, 'b': 0.998}.
Note that top and bottom are identical (0.998).

Even if a user manually adjusts bottom to 0.999 (a 0.001 difference), docling's internal logic converts these to pixels using int(). For a generic image height (e.g., 500px):

  • Top: int(0.998 * 500) = 499
  • Bottom: int(0.999 * 500) = 499

Result: top (499) == bottom (499). The crop height is 0 pixels, causing the crash.

Traceback:

File .../docling_core/types/doc/document.py:5670, in DoclingDocument.load_from_doctags
    image=ImageRef.from_pil(image=cropped_image, dpi=72),

File .../docling_core/types/doc/document.py:1127, in ImageRef.from_pil
    image.save(buffered, format="PNG")

... (PIL Internal calls) ...

File .../PIL/ImageFile.py:562, in _encode_tile
    encoder.setimage(im.im, extents)
SystemError: tile cannot extend outside image

Steps to reproduce

  1. Environment:

    • docling: 2.64.0
    • vllm: 0.11.2
    • Model: ibm-granite/granite-docling-258M
  2. Reproduction Script (Simulating the logic):
    Since the VLM output is non-deterministic, this script replicates the exact logic inside load_from_doctags that leads to the crash using the observed coordinates.

from PIL import Image
import io

# 1. Create a dummy image
im_width, im_height = 500, 500
image = Image.new("RGB", (im_width, im_height))

# 2. The problematic Bbox from VLM (Normalized)
# Top and Bottom are technically different floats, but very close.
bbox = {'l': 0.398, 't': 0.998, 'r': 0.606, 'b': 0.999} 

# 3. Docling's internal logic (converting to pixels)
crop_box = (
    int(bbox['l'] * im_width),
    int(bbox['t'] * im_height),
    int(bbox['r'] * im_width),
    int(bbox['b'] * im_height),
)

print(f"Calculated Crop Box: {crop_box}")
# Output: (199, 499, 303, 499) -> Height is 0!

# 4. Trigger the crash
# The crash happens here, when encoding the 0-height image to PNG
try:
    cropped = image.crop(crop_box)
    
    # Docling tries to save it to memory to create a Base64 string
    buf = io.BytesIO()
    cropped.save(buf, format="PNG") 
    print("Success (This line will not be reached)")
except Exception as e:
    print(f"\nCRASHED AS EXPECTED: {e}")

Suggested Fix:
The BoundingBox processing logic (or docling_core) needs to enforce a minimum pixel thickness after the int() conversion and before image.crop(). Validating float epsilon is insufficient because int(x) rounding can still collapse the dimension to zero.

Docling version

Docling version: 2.64.0
Docling Core version: 2.51.1
Docling IBM Models version: 3.10.0
Docling Parse version: 4.7.1
Python: cpython-310 (3.10.11)
Platform: macOS-26.1-arm64-arm-64bit

Python version

Python 3.10.11

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions