fix: Use model name for AutoProcessor to fix formula enrichment AttributeError #2682

b-g-d · 2025-11-25T22:40:00Z

Problem

When do_formula_enrichment=True is set, Docling fails with:

AttributeError: 'dict' object has no attribute 'model_type'

Root Cause

Passing a local Path object to AutoProcessor.from_pretrained() causes
transformers to load the tokenizer config as a dict, but then tries to access
_config.model_type as an object attribute.

Solution

Use the model name ('docling-project/CodeFormulaV2') instead of the local
path. Transformers will automatically use the cached model, and the config
will be properly loaded as an object.

Testing

Tested with do_formula_enrichment=True - no longer throws AttributeError
Verified formulas are still extracted correctly
Confirmed cached model is still used (no re-download)

github-actions · 2025-11-25T22:40:09Z

✅ DCO Check Passed

Thanks @b-g-d, all your commits are properly signed off. 🎉

dosubot · 2025-11-25T22:40:21Z

Related Documentation

Checked 4 published document(s) in 1 knowledge base(s). No updates required.

^{How did I do? Any feedback?}

mergify · 2025-11-25T22:40:35Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

…rained Fixes AttributeError when do_formula_enrichment=True. When a local Path is passed to AutoProcessor.from_pretrained(), transformers loads the config as a dict but then tries to access .model_type as an attribute. Using the model name ('docling-project/CodeFormulaV2') allows transformers to properly load the config as an object, while still using the cached model automatically. Fixes docling-project#2681 Signed-off-by: bryan <[email protected]>

b-g-d · 2025-11-25T22:45:33Z

Update: The underlying transformers bug has been fixed in transformers 4.57.3 (released 2025-11-25). However, this PR still provides value by:

Using explicit model names - More maintainable and clear than local paths
Better code clarity - Makes it obvious which model is being loaded
Works with all transformers versions - Not dependent on the transformers fix

The fix in transformers changed _config.model_type to _config.get("model_type") to properly handle dict configs when loading from local paths.

This PR can still be merged as it represents a best practice, even though the immediate bug is resolved in transformers 4.57.3+.

dolfim-ibm · 2025-12-01T07:17:27Z

@b-g-d The problem with your proposed solution is that Docling should be able to load the models from an artifacts directory which is not the HF cache dir.

b-g-d force-pushed the fix/formula-enrichment-attribute-error branch from b3bbd02 to a89cec1 Compare November 25, 2025 22:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Use model name for AutoProcessor to fix formula enrichment AttributeError #2682

fix: Use model name for AutoProcessor to fix formula enrichment AttributeError #2682

b-g-d commented Nov 25, 2025

Uh oh!

github-actions bot commented Nov 25, 2025 •

edited

Loading

Uh oh!

dosubot bot commented Nov 25, 2025

Uh oh!

mergify bot commented Nov 25, 2025

Uh oh!

b-g-d commented Nov 25, 2025

Uh oh!

dolfim-ibm commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: Use model name for AutoProcessor to fix formula enrichment AttributeError #2682

Are you sure you want to change the base?

fix: Use model name for AutoProcessor to fix formula enrichment AttributeError #2682

Conversation

b-g-d commented Nov 25, 2025

Problem

Root Cause

Solution

Testing

Uh oh!

github-actions bot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dosubot bot commented Nov 25, 2025

Uh oh!

mergify bot commented Nov 25, 2025

Merge Protections

🟢 Enforce conventional commit

Uh oh!

b-g-d commented Nov 25, 2025

Uh oh!

dolfim-ibm commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Nov 25, 2025 •

edited

Loading