Skip to content

Conversation

@danilojsl
Copy link
Contributor

@danilojsl danilojsl commented Dec 8, 2025

Description

This PR extends Reader2Image to support interoperability with AutoGGUFVisionModel by introducing flexible handling of encoded vs. decoded image bytes and optional prompt output.

Key changes

  • Added a new parameter useEncodedImageBytes to control whether the image result stores:

    • Encoded (compressed) file bytes for models like AutoGGUFVisionModel
    • Decoded pixel matrix for models such as Qwen2VLTransformer
  • Implemented outputPromptColumn parameter to optionally output a separate prompt column containing text prompts as Spark NLP Annotations.

Motivation and Context

Complete integration of Reader2Image with VLM models

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • Code improvements with no or little impact
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING page.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@danilojsl danilojsl marked this pull request as ready for review December 8, 2025 17:18
@danilojsl danilojsl requested a review from DevinTDHa December 8, 2025 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants