Skip to content

Conversation

@b-g-d
Copy link

@b-g-d b-g-d commented Nov 25, 2025

Fixes #2681

Problem

When do_formula_enrichment=True is set, Docling fails with:

AttributeError: 'dict' object has no attribute 'model_type'

Root Cause

Passing a local Path object to AutoProcessor.from_pretrained() causes
transformers to load the tokenizer config as a dict, but then tries to access
_config.model_type as an object attribute.

Solution

Use the model name ('docling-project/CodeFormulaV2') instead of the local
path. Transformers will automatically use the cached model, and the config
will be properly loaded as an object.

Testing

  • Tested with do_formula_enrichment=True - no longer throws AttributeError
  • Verified formulas are still extracted correctly
  • Confirmed cached model is still used (no re-download)

@github-actions
Copy link
Contributor

github-actions bot commented Nov 25, 2025

DCO Check Passed

Thanks @b-g-d, all your commits are properly signed off. 🎉

@dosubot
Copy link

dosubot bot commented Nov 25, 2025

Related Documentation

Checked 4 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

@mergify
Copy link

mergify bot commented Nov 25, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

…rained

Fixes AttributeError when do_formula_enrichment=True. When a local Path
is passed to AutoProcessor.from_pretrained(), transformers loads the
config as a dict but then tries to access .model_type as an attribute.

Using the model name ('docling-project/CodeFormulaV2') allows
transformers to properly load the config as an object, while still
using the cached model automatically.

Fixes docling-project#2681

Signed-off-by: bryan <[email protected]>
@b-g-d b-g-d force-pushed the fix/formula-enrichment-attribute-error branch from b3bbd02 to a89cec1 Compare November 25, 2025 22:42
@b-g-d
Copy link
Author

b-g-d commented Nov 25, 2025

Update: The underlying transformers bug has been fixed in transformers 4.57.3 (released 2025-11-25). However, this PR still provides value by:

  1. Using explicit model names - More maintainable and clear than local paths
  2. Better code clarity - Makes it obvious which model is being loaded
  3. Works with all transformers versions - Not dependent on the transformers fix

The fix in transformers changed _config.model_type to _config.get("model_type") to properly handle dict configs when loading from local paths.

This PR can still be merged as it represents a best practice, even though the immediate bug is resolved in transformers 4.57.3+.

@dolfim-ibm
Copy link
Member

@b-g-d The problem with your proposed solution is that Docling should be able to load the models from an artifacts directory which is not the HF cache dir.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AttributeError when enabling formula enrichment (do_formula_enrichment=True)

2 participants