Skip to content

Conversation

@rafaeltuelho
Copy link

  • Add DeepSeekOcrModel with automatic device detection (CUDA/MPS)
  • CUDA uses bfloat16 precision and flash_attention_2 (optimal)
  • MPS uses float16 precision and eager attention (Apple Silicon fallback)
  • Auto-switch to MPS-compatible model (Dogacel/DeepSeek-OCR-Metal-MPS)
  • Add PyTorch 2.7.0+ version validation for MPS support
  • Add clear error messages for device/version incompatibilities
  • Update test_e2e_ocr_conversion.py with CUDA/MPS device support
  • Add manual test script for DeepSeek-OCR validation
  • Update documentation with MPS support information

Note:

Resolves #2497

Checklist:

  • [y] Documentation has been updated, if necessary.
  • [y] Examples have been added, if necessary.
  • [y] Tests have been added, if necessary.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 4, 2025

DCO Check Passed

Thanks @rafaeltuelho, all your commits are properly signed off. 🎉

@dosubot
Copy link

dosubot bot commented Dec 4, 2025

Related Documentation

Checked 4 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

@mergify
Copy link

mergify bot commented Dec 4, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@rafaeltuelho rafaeltuelho force-pushed the feature/deepseek-ocr-integration branch from 74b3bcd to 076c3ad Compare December 4, 2025 02:02
@codecov
Copy link

codecov bot commented Dec 4, 2025

Codecov Report

❌ Patch coverage is 59.56284% with 74 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
docling/models/deepseek_ocr_model.py 55.68% 74 Missing ⚠️

📢 Thoughts on this report? Let us know!

@rafaeltuelho
Copy link
Author

It seems the reason for the lack of coverage in docling/models/deepseek_ocr_model.py is that the CI tests can't run DeepSeek-OCR tests because no GPU (CUDA/MPS) is available in the CI environment. Also, the DOCLING_TEST_DEEPSEECOCR environment variable is not set.

I had to use Google Colab (T4) to test it manually.

What is the recommended approach here?

- Add DeepSeekOcrModel with automatic device detection (CUDA → MPS → Error)
- Add DeepSeekOcrOptions for configuring the OCR engine
- Support CUDA with bfloat16 and flash_attention_2 (optimal performance)
- Support MPS (Apple Silicon) with float16 and eager attention (requires PyTorch 2.7.0+)
- Auto-switch to MPS-compatible model (Dogacel/DeepSeek-OCR-Metal-MPS) on Apple Silicon
- Add clear error messages for unsupported configurations
- Add mock-based unit tests for CI coverage without GPU hardware
- Update E2E tests with DOCLING_TEST_DEEPSEECOCR environment variable guard

Note: MPS support requires PyTorch 2.7.0+ for aten::_upsample_bicubic2d_aa operator.
See: https://github.com/Dogacel/DeepSeek-OCR-Metal-MPS/discussions

Signed-off-by: Rafael T. C. Soares <[email protected]>
@rafaeltuelho rafaeltuelho force-pushed the feature/deepseek-ocr-integration branch from 076c3ad to 4e93020 Compare December 4, 2025 21:16
@dolfim-ibm
Copy link
Member

@rafaeltuelho Thanks for the starting the contribution. It is definitely something which was on our radar as well.

The key question we would like to assess is if this model should be exposed as OCR engine or as model in the VLM pipeline.

@simonschoe
Copy link

@rafaeltuelho not sure if that aligns with what @dolfim-ibm refers to: as a user it would be amazing to be able to integrate deepseek ocr as an external service, i.e., via api calls, instead of as a local model as part of the regular pipeline

@dolfim-ibm
Copy link
Member

@rafaeltuelho not sure if that aligns with what @dolfim-ibm refers to: as a user it would be amazing to be able to integrate deepseek ocr as an external service, i.e., via api calls, instead of as a local model as part of the regular pipeline

@simonschoe Untested, but I think you could already use DeepSeekOCR with the markdown prompt for the VLM API Docling settings. https://docling-project.github.io/docling/examples/vlm_pipeline_api_model/

@rafaeltuelho
Copy link
Author

rafaeltuelho commented Dec 5, 2025

The key question we would like to assess is if this model should be exposed as OCR engine or as model in the VLM pipeline.

@dolfim-ibm That's a good question. I remember I have only used/tested the VLM for Picture description. Is it possible to run the VLM pipeline for OCR-based conversion?

I tried to follow the same approach used by other OCR (eg: EasyOcr) already supported in Docling

>>> options = DeepSeekOcrOptions(prompt="<image>\\nConvert to markdown.")
"""

kind: ClassVar[Literal["deepseecocr"]] = "deepseecocr"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
kind: ClassVar[Literal["deepseecocr"]] = "deepseecocr"
kind: ClassVar[Literal["deepseekocr"]] = "deepseekocr"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also adapt all the other occurrences of deepseecocr

)
self.options: DeepSeekOcrOptions

self.scale = 3 # multiplier for 72 dpi == 216 dpi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this "simply" copied from the other OCR models or is it the preferred value for DeepSeek-OCR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is being used in other OCRs as well (EasyOCR and Tesseract)

Comment on lines +97 to +104
'transformers (>=4.46.0,<5.0.0)',
'torch (>=2.0.0)',
'einops',
'Pillow (>=10.0.0)',
'addict',
'easydict',
'matplotlib',
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing what seems not to be used/needed

Suggested change
'transformers (>=4.46.0,<5.0.0)',
'torch (>=2.0.0)',
'einops',
'Pillow (>=10.0.0)',
'addict',
'easydict',
'matplotlib',
]
'transformers (>=4.46.0,<5.0.0)',
'torch (>=2.0.0)',
'Pillow (>=10.0.0)',
]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my tests, DeepSeek-OCR required addict matplotlib easydict to be present in order to process the parser...


# DeepSeek OCR - requires GPU (CUDA or MPS) and transformers
# Only run if explicitly enabled via environment variable
# Set DOCLING_TEST_DEEPSEECOCR=true to include DeepSeek-OCR tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of the ENV, could we detect if deepseek-ocr is runnable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by runnable?

Comment on lines +368 to +381
_log.error(
"DeepSeek-OCR MPS model incompatibility detected!\n\n"
"The MPS-compatible model 'Dogacel/DeepSeek-OCR-Metal-MPS' uses deprecated "
"transformers APIs (DynamicCache.seen_tokens) that are not compatible with "
"your current transformers version.\n\n"
"This is a known issue with the community-maintained MPS fork.\n"
"See: https://github.com/Dogacel/DeepSeek-OCR-Metal-MPS/issues\n\n"
"Workarounds:\n"
" 1. Use a different OCR engine that supports MPS:\n"
" - EasyOcrOptions(lang=['en'])\n"
" - RapidOcrOptions()\n"
" 2. Wait for the MPS model to be updated for newer transformers versions\n"
" 3. Test in an isolated environment with transformers==4.43.4 (not recommended)\n"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this issue is solved. I updated the upstream to support latest transformers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support deepseek ocr

4 participants