Skip to content

sd: simplify koboldcpp model replacement#2038

Draft
wbruna wants to merge 1 commit intoLostRuins:concedo_experimentalfrom
wbruna:kcpp_sd_model_replacement
Draft

sd: simplify koboldcpp model replacement#2038
wbruna wants to merge 1 commit intoLostRuins:concedo_experimentalfrom
wbruna:kcpp_sd_model_replacement

Conversation

@wbruna
Copy link

@wbruna wbruna commented Mar 16, 2026

Keeps the switching logic as a single block to make future merges easier.

Recent upstream changes conflict with the current TAE auto-selection code, so I tried to change it into a simpler and more self-contained approach.

Models which use LLMs seem to work OK, but I couldn't test Wan and Qwen. Also, I've noticed this chunk has identical code for any value of clipg_path_fixed, but I'm not completely sure about what it should do instead:

            if(clipl_path_fixed=="" && clipg_path_fixed=="")
            {
                clipl_path_fixed = t5_path_fixed;
                t5_path_fixed = "";
            }
            else if(clipl_path_fixed=="" && clipg_path_fixed!="")
            {
                clipl_path_fixed = t5_path_fixed;
                t5_path_fixed = "";
            }

Anyway, if you could describe what each case should do, I'd replace it directly with the final fields (llm_path_fixed and clip_vision_fixed) to make it clearer.

Keeps the switching logic as a single block to make future
merges easier.
@wbruna
Copy link
Author

wbruna commented Mar 16, 2026

Anyway, those VAE changes caused an issue with Flux, so I won't be merging them too soon.

@LostRuins
Copy link
Owner

Anyway, if you could describe what each case should do, I'd replace it directly with the final fields (llm_path_fixed and clip_vision_fixed) to make it clearer.

Background: Originally it started with StableDiffusion3 which requires a diffusion model, a VAE, a T5xxl model, a clip_l and clip_g model. But then as more architectures got added, each model type required some different combination or augmentation from the above.

The main idea is to allow the user to load each type of image model components in an intuitive manner, by intelligently guessing what auxiliary files they have brought along and loading them correctly.

So for example if they loaded Z-image turbo, well technically it only needs the Diffuser and the Qwen3-4B as a text encoder. But they might place the text encoder into --t5xxl instead, which would be re-routed correctly to --clip1 (aka --llm upstream) if the arch is z-image.

I think the code you see is now quite janky because its been iteratively modified over time while trying not to break upstream syncs. But it basically spells out what to do if a t5 field was filled and neither of, each of, or both of the fields were populated.

Looking back now I think the two cases can indeed be simplified into a single case:
i.e. if the user has picked a qwen based diffuser, provided a t5, and didn't provide a clip1 field, swap the t5 into the clip1 field which gets used as the llm.

Agree its rather chaotic and messy now. Feel free to refactor to a more intuitive style that works with upstream code, i'll see what you might have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants