sd: simplify koboldcpp model replacement by wbruna · Pull Request #2038 · LostRuins/koboldcpp

wbruna · 2026-03-16T02:04:45Z

Keeps the switching logic as a single block to make future merges easier.

Recent upstream changes conflict with the current TAE auto-selection code, so I tried to change it into a simpler and more self-contained approach.

Models which use LLMs seem to work OK, but I couldn't test Wan and Qwen. Also, I've noticed this chunk has identical code for any value of clipg_path_fixed, but I'm not completely sure about what it should do instead:

            if(clipl_path_fixed=="" && clipg_path_fixed=="")
            {
                clipl_path_fixed = t5_path_fixed;
                t5_path_fixed = "";
            }
            else if(clipl_path_fixed=="" && clipg_path_fixed!="")
            {
                clipl_path_fixed = t5_path_fixed;
                t5_path_fixed = "";
            }

Anyway, if you could describe what each case should do, I'd replace it directly with the final fields (llm_path_fixed and clip_vision_fixed) to make it clearer.

Keeps the switching logic as a single block to make future merges easier.

wbruna · 2026-03-16T09:50:19Z

Anyway, those VAE changes caused an issue with Flux, so I won't be merging them too soon.

LostRuins · 2026-03-16T10:36:03Z

Anyway, if you could describe what each case should do, I'd replace it directly with the final fields (llm_path_fixed and clip_vision_fixed) to make it clearer.

Background: Originally it started with StableDiffusion3 which requires a diffusion model, a VAE, a T5xxl model, a clip_l and clip_g model. But then as more architectures got added, each model type required some different combination or augmentation from the above.

The main idea is to allow the user to load each type of image model components in an intuitive manner, by intelligently guessing what auxiliary files they have brought along and loading them correctly.

So for example if they loaded Z-image turbo, well technically it only needs the Diffuser and the Qwen3-4B as a text encoder. But they might place the text encoder into --t5xxl instead, which would be re-routed correctly to --clip1 (aka --llm upstream) if the arch is z-image.

I think the code you see is now quite janky because its been iteratively modified over time while trying not to break upstream syncs. But it basically spells out what to do if a t5 field was filled and neither of, each of, or both of the fields were populated.

Looking back now I think the two cases can indeed be simplified into a single case:
i.e. if the user has picked a qwen based diffuser, provided a t5, and didn't provide a clip1 field, swap the t5 into the clip1 field which gets used as the llm.

Agree its rather chaotic and messy now. Feel free to refactor to a more intuitive style that works with upstream code, i'll see what you might have.

sd: simplify koboldcpp model replacement

a5f49a8

Keeps the switching logic as a single block to make future merges easier.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sd: simplify koboldcpp model replacement#2038

sd: simplify koboldcpp model replacement#2038
wbruna wants to merge 1 commit intoLostRuins:concedo_experimentalfrom
wbruna:kcpp_sd_model_replacement

wbruna commented Mar 16, 2026

Uh oh!

wbruna commented Mar 16, 2026

Uh oh!

LostRuins commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wbruna commented Mar 16, 2026

Uh oh!

wbruna commented Mar 16, 2026

Uh oh!

LostRuins commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants