Skip to content

Model answers its own questions without waiting for user input (Classroom/Q&A use case) #139

@Rickvw0604

Description

@Rickvw0604

Description of the bug:

I'm building a Spanish tutoring app using the Gemini Live API (gemini-2.5-flash-native-audio-preview-09-2025) with native audio. The app has a "classroom mode" where the AI tutor asks the student a question and waits for their spoken response.

The problem is that the model frequently answers its own questions in the same turn, without waiting for the user to speak. It generates something like:

"How do you say 'I give him the book' in Spanish?" [should stop here and wait]
"Le doy el libro. Correct! Now the next question..." [model answered itself]

The user never gets a chance to respond. Even worse, when the user DOES respond to an earlier question, the model sometimes says "That's correct!" even when the user said "I don't know" - because it's praising its own hallucinated answer, not the user's actual response. It only does this when function calling is involved. I also have a free speech mode where users just have a chat with the modal and the only function is called at the end to indicate the end of the conversation and here everything works just fine.

What I've tried:

  1. Prompt engineering (many variations)
  • "After asking a question: STOP IMMEDIATELY. Do not continue speaking."
  • "ONE question per turn only. Never ask multiple questions."
  • "NEVER say 'Correcto' or 'Perfecto' unless you have ACTUALLY HEARD the student's response."
  • "You are in a LIVE AUDIO conversation. The student speaks through a microphone. You MUST wait for their actual voice response."
  • "If you find yourself saying 'Correcto' without hearing a response, STOP - you are hallucinating."

None of these work reliably. The model seems to not understand it's in a real-time voice context.

  1. Adjusting VAD settings

realtimeInputConfig: {
automaticActivityDetection: {
silenceDurationMs: 2000,
}
}

Tried various silence durations. The model generates so fast that it doesn't pause between asking a question and answering it.

  1. Client-side interruption on question mark detection

I tried detecting when the output transcription contains a ?, then immediately sending a sendClientContent message to interrupt the model (per the docs: "A message here will interrupt any current model generation"). Also stopped audio playback.

This just completely broke everything.

  1. Using turnComplete: false with tool responses

When the model calls my trackConceptCovered() function, I send a tool response. I tried various combinations of behavior: "NON_BLOCKING" and scheduling: "SILENT" to prevent the model from regenerating/repeating after receiving the tool response. This helped with a repetition issue I had, but made the turn-skipping problem worse.

Environment:

  • Model: gemini-2.5-flash-native-audio-preview-09-2025
  • Using WebSocket client-to-server approach
  • React + TypeScript frontend
  • Audio: Real-time PCM streaming with automatic VAD

Questions:

  1. Is there a way to force the model to stop generation after asking a question?
  2. Is there a turn-taking mode or parameter I'm missing that's designed for Q&A scenarios?
  3. Is this a known limitation of the native audio model?

Any help would be appreciated! Happy to provide more code details if needed.

Actual vs expected behavior:

Expected behavior:

The model should ask ONE question, then stop generating and wait for the user's microphone input. It should only respond after actually receiving audio from the user.

Actual behavior:

The model asks a question, then immediately generates an answer to its own question and moves on to the next question - all in a single turn. The user's actual input is ignored or incorrectly evaluated.

Any other information you'd like to share?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions