Skip to content

.Net: Bug: kernel.InvokeAsync returns empty result for thinking-enabled models (Qwen3.5) via SK's Ollama connector #14078

@ralmutaweh

Description

@ralmutaweh

Describe the bug
When invoking a prompt function via kernel.InvokeAsync(function, ...) with the Ollama connecter and a thinking-enabled model, in my case Qwen3.5, the result is an empty string.

The model generates a correct response, but it lands in Ollama's thinking stream rather than message.content, and OllamaPromptExecutionSettings provides no way to set think=false to prevent this. FunctionResult therefore returns empty.

To Reproduce
Steps to reproduce the behavior:

  1. Configure a kernel with the Ollama connector pointing to a thinking-enabled model (e.g. qwen3.5:9b)
  2. Create a prompt function: var function = kernel.CreateFunctionFromPrompt(promptTemplate);
  3. Invoke it with no execution settings: var response = await kernel.InvokeAsync(function, new KernelArguments { ["extractedText"] = extractedText });
  4. Call response.ToString() — result is an empty string, despite the function completing successfully and the model taking the full generation time (~70s in our case)

Expected behavior
response.ToString() should return the model's generated content. Either OllamaPromptExecutionSettings should expose a Think property (mapping to Ollama's top-level think request filed) so users can set think=false, or ChatMessageContent should surface the thinking field content as fallback or separate property so it isn't silently dropped.

Platform

  • Language: C#
  • Source: NuGet package Microsoft.SemanticKernel.Connectors.Ollama 1.77.0-alpha (latest available)
  • AI model: Ollama: qwen3.5:9b (local)
  • IDE: Visual Studio Code
  • OS: macOS (client), Windows (Ollama host)

Additional context
The root cause has been confirmed by checking OllamaPromptExecutionSettings and saw that is exposes NumPredict, Temperature, TopK, TopP, Stop but no Think property and ExtensionData does not map to Ollama's top-level think request field.

Log output from broken path (kernel.InvokeAsync, no execution settings):
Image

But a working fix has been done by bypassing the kernel.InvokeAsync and calling OllamaApiClient.ChatAsync directly with Think = false, Stream = false:

Image

Resulting in the following (in 23.4s):

Image

Finally, workarounds for this issue are done by accessing raw response payloads / custom handlers, which appears to be the norm for reasoning models generally, not specific to Ollama — see #13889 and related discussion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    .NETIssue or Pull requests regarding .NET codebugSomething isn't workingtriage

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions