.Net: Bug: kernel.InvokeAsync returns empty result for thinking-enabled models (Qwen3.5) via SK's Ollama connector

**Describe the bug**
When invoking a prompt function via _**kernel.InvokeAsync(function, ...)**_ with the Ollama connecter and a thinking-enabled model, in my case Qwen3.5, the result is an empty string.

The model generates a correct response, but it lands in Ollama's thinking stream rather than message.content, and **OllamaPromptExecutionSettings** provides no way to set **think=false** to prevent this. **FunctionResult** therefore returns empty.

**To Reproduce**
Steps to reproduce the behavior:
1. Configure a kernel with the Ollama connector pointing to a thinking-enabled model (e.g. qwen3.5:9b)
3. Create a prompt function: **var function = kernel.CreateFunctionFromPrompt(promptTemplate);**
4. Invoke it with no execution settings: **var response = await kernel.InvokeAsync(function, new KernelArguments { ["extractedText"] = extractedText });**
5. Call **response.ToString()** — result is an empty string, despite the function completing successfully and the model taking the full generation time (~70s in our case)

**Expected behavior**
**response.ToString()** should return the model's generated content. Either **OllamaPromptExecutionSettings** should expose a Think property (mapping to Ollama's top-level think request filed) so users can set **think=false**, or **ChatMessageContent** should surface the thinking field content as fallback or separate property so it isn't silently dropped.

**Platform**
- Language: C#
- Source: NuGet package Microsoft.SemanticKernel.Connectors.Ollama 1.77.0-alpha (latest available)
- AI model: Ollama: qwen3.5:9b (local)
- IDE: Visual Studio Code
- OS: macOS (client), Windows (Ollama host)

**Additional context**
The root cause has been confirmed by checking OllamaPromptExecutionSettings and saw that is exposes NumPredict, Temperature, TopK, TopP, Stop but no **Think** property and ExtensionData does not map to Ollama's top-level think request field.

Log output from broken path (kernel.InvokeAsync, no execution settings):
<img width="703" height="81" alt="Image" src="https://github.com/user-attachments/assets/a863b22f-da27-418d-9f12-86ea614a3295" />


But a working fix has been done by bypassing the _**kernel.InvokeAsync**_ and calling OllamaApiClient.ChatAsync directly with Think = false, Stream = false:

<img width="707" height="387" alt="Image" src="https://github.com/user-attachments/assets/c5b02491-75ce-4f5d-ac10-15a09ed31291" />

Resulting in the following (in 23.4s): 

<img width="707" height="148" alt="Image" src="https://github.com/user-attachments/assets/ae258bd8-363a-4ed6-9b63-a819b3bff125" />


Finally, workarounds for this issue are done by accessing raw response payloads / custom handlers, which appears to be the norm for reasoning models generally, not specific to Ollama — see #13889 and related discussion.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.Net: Bug: kernel.InvokeAsync returns empty result for thinking-enabled models (Qwen3.5) via SK's Ollama connector #14078

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

.Net: Bug: kernel.InvokeAsync returns empty result for thinking-enabled models (Qwen3.5) via SK's Ollama connector #14078

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions