Skip to content

Audio: support app-audio publishing without claiming a device#985

Open
sgu-bithuman wants to merge 1 commit intolivekit:mainfrom
bithuman-product:bithuman/manual-mode-app-audio
Open

Audio: support app-audio publishing without claiming a device#985
sgu-bithuman wants to merge 1 commit intolivekit:mainfrom
bithuman-product:bithuman/manual-mode-app-audio

Conversation

@sgu-bithuman
Copy link
Copy Markdown

Summary

AudioManager.shared.mixer.capture(appAudio:) is the documented entry-point for feeding app-supplied PCM into the publish path (used by MacOSScreenCapturer, BroadcastScreenCapturer, and any custom audio source). In manual rendering mode it drops every buffer because the input chain (appNode → appMixerNode → mainMixerNode) is never wired up — the WebRTC ADM only invokes engineWillConnectInput when connecting a real device, which manual rendering by definition has none of.

This blocks several scenarios that AudioManager.setManualRenderingMode(_:) otherwise enables:

  • Server-side avatars / agents that only republish app-supplied audio (no mic). Multi-process deployments on a single Mac can't share the audio device, so the existing startLocalRecording() path tops out at one audio publisher per machine. See testManualRenderingModePublishAudio in LiveKitAudioTests — that test demonstrates the publish flow but only one process can hold the device.
  • Screen-share with system audio but no microphone.
  • Custom audio sources (e.g. file playback) that don't have a real input device.

Changes

  • Add MixerEngineObserver.wireAppAudioPath() — public, idempotent. Wires appNode → appMixerNode → mainMixerNode against engine.mainMixerNode using the same Float32 player-node format derivation as the src=nil branch in engineWillConnectInput.
  • capture(appAudio:) lazy-wires the path on the first call when it sees a manual-rendering engine. Existing call sites get the fix automatically; no API churn for callers.
  • engineWillRelease clears isInputConnected and mainMixerNode so a recreated engine re-runs the wiring.

Purely additive — no public API removed, no signature changes. Engine-running and isInputConnected guards in capture(appAudio:) are preserved.

Test plan

  • Builds cleanly: swift build --target LiveKit
  • Verified end-to-end with eight server-side processes publishing audio concurrently on one Mac. PCM byte-stream → mixer.capture(appAudio:) → published track. Each process: ~0.019 avg / ~0.25 peak RMS, ~36% non-zero samples vs ~2% noise floor on the previous code path.
  • Existing LocalAudioTrack mic flow unchanged (the new code only fires when engine.isInManualRenderingMode).
  • CI / LiveKitAudioTests pass (would benefit from a new test mirroring testManualRenderingModePublishAudio but with mixer.capture(appAudio:) as the source, replacing the file player). Happy to add this if useful.

🤖 Generated with Claude Code

Add `MixerEngineObserver.wireAppAudioPath()` so callers can connect the
`appNode → appMixerNode → mainMixerNode` graph in manual rendering
mode, where `engineWillConnectInput` is never invoked by the WebRTC
ADM (no real device) and `mixer.capture(appAudio:)` would otherwise
drop every buffer. `capture(appAudio:)` also lazy-wires the path on
its first call, so existing call sites (server-side avatars,
`MacOSScreenCapturer`, `BroadcastScreenCapturer`, custom audio
sources) get the fix automatically.

This unblocks several no-device scenarios that work in `AudioManager`'s
manual rendering mode but had no path to actually publish:

- Server-side avatars that only republish app-supplied audio (no mic).
  Multi-process deployments on a single Mac can't share the audio
  device, so the existing `startLocalRecording()` path tops out at one
  audio publisher per machine.
- Screen-share with system audio but no microphone.
- Custom audio sources (e.g. file playback) without a real input.

`engineWillRelease` now also clears `isInputConnected` and
`mainMixerNode` so a recreated engine re-runs the wiring.

Verified end-to-end with eight server-side processes publishing audio
concurrently on the same Mac. The lazy auto-wire matches the existing
`engineWillConnectInput` behavior — Float32 player-node format derived
from the engine's mainMixer output format. Idempotent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@pblazej pblazej requested review from hiroshihorie and pblazej May 5, 2026 06:53
@pblazej
Copy link
Copy Markdown
Contributor

pblazej commented May 5, 2026

@sgu-bithuman could you add some description of the scenarios that are not covered by the current implementation?

E.g. https://github.com/webrtc-sdk/webrtc/blob/597a7f4fb2c252c8d3c1d8d8144bfb730fbacaf2/modules/audio_device/audio_engine_device.mm#L1626-L1650

@Test func manualRenderingModePublishAudio() async throws {
// Sample audio
let url = try #require(URL(string: "https://github.com/rafaelreis-hotmart/Audio-Sample-files/raw/refs/heads/master/sample.wav"))
print("Downloading sample audio from \(url)...")
let (downloadedLocalUrl, _) = try await URLSession.shared.downloadBackport(from: url)
// Move the file to a new temporary location with a more descriptive name, if desired
let tempLocalUrl = FileManager.default.temporaryDirectory.appendingPathComponent(UUID().uuidString).appendingPathExtension("wav")
try FileManager.default.moveItem(at: downloadedLocalUrl, to: tempLocalUrl)
print("Original file: \(tempLocalUrl)")
let audioFile = try AVAudioFile(forReading: tempLocalUrl)
let audioFileFormat = audioFile.processingFormat // AVAudioFormat object
print("Sample Rate: \(audioFileFormat.sampleRate)")
print("Channel Count: \(audioFileFormat.channelCount)")
print("Common Format: \(audioFileFormat.commonFormat)")
print("Interleaved: \(audioFileFormat.isInterleaved)")
// Set manual rendering mode...
try AudioManager.shared.setManualRenderingMode(true)
// Check if manual rendering mode is set...
let isManualRenderingMode = AudioManager.shared.isManualRenderingMode
print("manualRenderingMode: \(isManualRenderingMode)")
#expect(isManualRenderingMode)
let readBuffer = try #require(AVAudioPCMBuffer(pcmFormat: audioFileFormat, frameCapacity: 480))
try await TestEnvironment.withRoom(RoomTestingOptions(canPublish: true)) { room in
let ns5 = UInt64(20 * 1_000_000_000)
try await Task.sleep(nanoseconds: ns5)
try await room.localParticipant.setMicrophone(enabled: true)
repeat {
do {
try audioFile.read(into: readBuffer, frameCount: 480)
print("Read buffer frame capacity: \(readBuffer.frameLength)")
AudioManager.shared.mixer.capture(appAudio: readBuffer)
} catch {
print("Read buffer failed with error: \(error)")
break
}
} while true
let ns = UInt64(10 * 1_000_000_000)
try await Task.sleep(nanoseconds: ns)
}
}

Based on an initial review, it should work "as is".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants