gemini-3.1-flash-lite-preview: thinking_level silently ignored when audio input is present


#### Environment details

- Programming language: Python 3.10.19
- OS: Linux (WSL2, Ubuntu)
- Package version: `google-genai` 1.68.0
- API: Gemini Developer API (API key)

#### What happened

I'm building a TTS audio evaluation pipeline that sends WAV audio to Gemini and asks it to compare/rate them. I noticed that `gemini-3.1-flash-lite-preview` always returns `thoughts_token_count: 0` when audio is included in the request, even with `thinking_level="medium"` or `"high"`.

Text-only requests on the same model work fine — thinking tokens are generated as expected.
Same audio requests on `gemini-3-flash-preview` also work fine — 100% thinking activation.

So the issue seems specific to the combination of **Flash Lite + audio input**.

#### Steps to reproduce

1. Send a request to `gemini-3.1-flash-lite-preview` with inline audio bytes and `ThinkingConfig(thinking_level="medium")`
2. Check `response.usage_metadata.thoughts_token_count`
3. It returns 0. The same request without audio, or the same request on `gemini-3-flash-preview`, returns non-zero thinking tokens.

Minimal script:

```python
import os
from google import genai
from google.genai.types import GenerateContentConfig, Part, ThinkingConfig

client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])

audio_bytes = open("sample.wav", "rb").read()  # any short WAV

# This produces 0 thinking tokens
response = client.models.generate_content(
    model="gemini-3.1-flash-lite-preview",
    contents=[
        Part.from_text(text="Rate this audio quality from 1-10."),
        Part.from_bytes(data=audio_bytes, mime_type="audio/wav"),
    ],
    config=GenerateContentConfig(
        temperature=1.0,
        thinking_config=ThinkingConfig(thinking_level="medium"),
    ),
)
print(response.usage_metadata.thoughts_token_count)  # 0

# Swap to gemini-3-flash-preview → non-zero thinking tokens
```

No errors are raised. The parameter is silently accepted but has no effect.

#### What I tested

I ran a more thorough test to isolate the issue: 4 input types × 2 models × 25 calls = 200 calls, all with `thinking_level="medium"`.

**Thinking activation (calls with `thoughts_token_count > 0`):**

| Input | `gemini-3-flash-preview` | `gemini-3.1-flash-lite-preview` |
|---|---|---|
| Text only | 25/25 (100%) | 25/25 (100%) |
| Text + 1 WAV | 25/25 (100%) | 2/25 (8%) |
| Text + 2 WAVs | 25/25 (100%) | 0/25 (0%) |
| Text + 3 WAVs | 25/25 (100%) | 0/25 (0%) |

Flash Lite thinks normally on text-only, but drops to near-zero once any audio is in the request.

I also ran 600 calls on Flash Lite alone across all 4 thinking levels (with 3 audio inputs):

| `thinking_level` | Fired / 150 |
|---|---|
| `minimal` | 0 (0%) |
| `low` | 108 (72%) |
| `medium` | 0 (0%) |
| `high` | 22 (14.7%) |

The ordering doesn't make sense — `low` triggers thinking far more than `medium` or `high`.

I also tried adding explicit instructions in the prompt like "Think step by step", "Listen to each audio carefully and thoroughly compare them before answering", and requesting a `reasoning` field in the output. None of these made a difference — `thoughts_token_count` stayed at 0 with audio input on Flash Lite.

#### Why it matters

In my use case (comparing TTS audio samples), responses without thinking show extreme position bias — the model just picks whichever audio was presented second/middle without actually comparing them. With thinking enabled (on `gemini-3-flash-preview`), the responses are far more meaningful. So this isn't just a cosmetic token count issue; it directly affects output quality for audio tasks.

#### My understanding

- This looks like a product-side issue (model behavior), not a client library bug
- The [model card](https://ai.google.dev/gemini-api/docs/models/gemini-3.1-flash-lite-preview) and [thinking docs](https://ai.google.dev/gemini-api/docs/thinking) both list Flash Lite as supporting all four thinking levels with multimodal input

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gemini-3.1-flash-lite-preview: thinking_level silently ignored when audio input is present #2204

Environment details

What happened

Steps to reproduce

What I tested

Why it matters

My understanding

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Input	`gemini-3-flash-preview`	`gemini-3.1-flash-lite-preview`
Text only	25/25 (100%)	25/25 (100%)
Text + 1 WAV	25/25 (100%)	2/25 (8%)
Text + 2 WAVs	25/25 (100%)	0/25 (0%)
Text + 3 WAVs	25/25 (100%)	0/25 (0%)

gemini-3.1-flash-lite-preview: thinking_level silently ignored when audio input is present #2204

Description

Environment details

What happened

Steps to reproduce

What I tested

Why it matters

My understanding

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions