Skip to content

support language_model_only#112

Merged
Jintao-Huang merged 9 commits into
modelscope:mainfrom
Jintao-Huang:support_language_model_only
Jun 4, 2026
Merged

support language_model_only#112
Jintao-Huang merged 9 commits into
modelscope:mainfrom
Jintao-Huang:support_language_model_only

Conversation

@Jintao-Huang

Copy link
Copy Markdown
Collaborator

No description provided.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a language_model_only configuration option to allow running multimodal models in a language-model-only mode, updating the model initialization, forward pass, and bridge conversion logic to bypass visual components when enabled. It also bumps the transformers dependency version range. The review feedback points out a critical issue in mm_gpt_model.py where bypassing _patch_word_embeddings when language_model_only is active will break essential embedding-level operations like context parallel splitting and sequence parallel reduce-scatter. Since _patch_word_embeddings already handles a missing visual component gracefully, it should be retained.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/mcore_bridge/model/mm_gpt_model.py Outdated
@Jintao-Huang

Copy link
Copy Markdown
Collaborator Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a language_model_only configuration option to allow running multimodal models in a language-model-only mode, bypassing visual and audio processing. The changes span across configuration, bridge, and model files to conditionally skip multimodal initialization and embedding logic. However, a critical logic bug was identified in mm_gpt_model.py where the check for self.visual is not None will lead to incorrect method calls or AttributeError crashes depending on whether self.visual is initialized. A code suggestion is provided to correctly branch based on self.config.language_model_only when self.visual is present.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/mcore_bridge/model/mm_gpt_model.py Outdated
@Jintao-Huang

Copy link
Copy Markdown
Collaborator Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a language_model_only configuration option to allow running multimodal GPT models in a language-only mode, bypassing visual module processing and mapping. The changes span configuration setup, bridge conversion logic, and model forward passes, specifically adding support for Gemma4. Feedback on the changes highlights two critical issues: first, a bug in gemma4.py where unmasked input_ids are returned instead of the masked llm_input_ids, which could cause incorrect behavior in the language model; second, a missing defensive None check on self.visual in mm_gpt_model.py that could lead to an AttributeError if the visual component is not initialized.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/mcore_bridge/model/mm_gpts/gemma4.py Outdated
Comment thread src/mcore_bridge/model/mm_gpt_model.py Outdated
@Jintao-Huang

Copy link
Copy Markdown
Collaborator Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a language_model_only configuration option to the multimodal GPT model bridge, allowing multimodal models (such as Gemma 4) to run in a language-model-only mode. This is achieved by bypassing visual/multimodal processing steps during initialization and forward passes, and introducing dedicated language-model-only embedding retrieval methods. Feedback on the changes suggests explicitly casting self.embed_scale to the device of inputs_embeds in get_inputs_embeds_language_model to prevent potential device mismatch errors in multi-GPU or parallel environments.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/mcore_bridge/model/mm_gpts/gemma4.py
@Jintao-Huang Jintao-Huang merged commit 47e1630 into modelscope:main Jun 4, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants