Support for large RPC messages using data streams by 1egoman · Pull Request #977 · livekit/client-sdk-swift

1egoman · 2026-04-28T21:09:25Z

See this pull request for more info: livekit/client-sdk-js#1832

@pblazej Sent you a slack message about this with some of the context!

github-actions · 2026-04-28T21:09:36Z

⚠️ This PR does not contain any files in the .changes directory.

pblazej · 2026-04-29T05:44:24Z

The general direction looks good, need a deeper review for possible breaking changes, etc.

…r connection cases

1egoman · 2026-04-30T20:06:34Z

@pblazej @hiroshihorie Two things:

I might need some help getting the ci to pass for this. Looking at the test failures, it looks like the test failures differ each time, and in some targets the tests pass fully while others they don't (and the same tests should be running everywhere...) - are the tests flaky?
I have run this implementation through the rpc testing app I built which exercises a given RPC v2 implementation through all the test cases and for the cases that are easy to test, it passes fully! 🎉 . So I think this is in a good state to be reviewed at this point.

… RpcClientManager layers

pblazej · 2026-05-04T09:28:24Z

+    ///
+    /// Defaults to ``ClientProtocol/v1``, which enables RPC v2 (data-stream-based payloads
+    /// with no 15 KB size limit). Generally, it's not recommended to change this.
+    public let clientProtocol: ClientProtocol


Hm, just thinking out loud: should we even expose this? (applies to protocol as well Generally, it's not recommended to change this.)

Yea, I had the same thought. Originally clientProtocol wasn't in here but I moved it in here as a secondary step to be consistent with protocol.

Some more thinking out loud: IMO, they both should be in the same place ideally since conceptually they communicate the same type of state. Would removing protocol from here be a breaking change? If not, then these could both become constants (which was what clientProtocol was originally before I moved it in here).

One (maybe) advantage to exposing it could be that you could configure a client in an older clientProtocol version for end to end tests to test unusual scenarios? That's a pretty weak argument though and if it's going to actively confuse users then definitely not a good idea.

If there's no clear best path forward, I'm also open to making protocol/clientProtocol work inconsistently in the short term and making a follow up task for you or somebody else to clean it up.

pblazej

Generally a fan of the split into RpcClientManager/RpcServerManager + clientProtocol negotiation — wire format matches the JS reference (rpc-with-compression-data-streams) byte-for-byte 🎉. There's a P0 build break on Xcode 16.2 (trailing comma) plus a few concurrency bugs around the caller-side pending state (race + potential double-resume on CheckedContinuation); also a 15 KB cap accidentally being applied on the v2 response path. Inline comments below cover those plus a bunch of test-side polish — should be a much tighter v2 once those are addressed.

pblazej · 2026-05-04T10:24:16Z

 }

+extension Room {
+    private static let reservedStreamTopics: Set<String> = [


P1: Could we make lk. a global reserved prefix instead of a hand-maintained set? Server convention already documents lk. as LiveKit's namespace, and a single topic.hasPrefix("lk.") check (applied on register and unregister 👇) would cover any future internal topic without piecemeal additions:

static func checkReserved(topic: String) throws { guard !topic.hasPrefix("lk.") else { throw LiveKitError(.invalidParameter, message: "Stream topic prefix 'lk.' is reserved for internal SDK use") } }

Worth applying to unregisterByteStreamHandler/unregisterTextStreamHandler as well — currently a user can call unregisterTextStreamHandler(for: "lk.rpc_request") and silently disable our internal RPC v2 dispatch.

It is technically a breaking change for users who registered on lk.* topics, but they were against the convention anyway — single CHANGELOG line.

Sounds good - fixed in 930e636.

@pblazej I mentioned this to @lukasIO and he wasn't a fan.

LLM summary of lukas's perspective from our chat:

Breaking change cost is high — that alone is a strong reason not to do it. Dependencies complicate a hard block — other SDKs that build on the core SDK (e.g., the agents SDK) legitimately register lk.* listeners, so a blanket prohibition doesn't cleanly work. A suppression mechanism is the wrong fix — letting some callers opt out of the warning/block "because they know what they're doing" doesn't sit right with him. Prefer a structural solution — if the SDK registers its own lk.* listeners at room construction time, any later user attempt to register the same name would naturally fail through the existing duplicate-registration path, without needing a special rule. Low priority absent real reports — he asks whether anyone has actually hit this, implying the change isn't worth the cost without evidence of the problem.

I don't think I really have a strong perspective here, I think @pblazej if this is important to you there are some way to do this which could still permit some of the agents sdk usage patterns like warnings rather than errors. Feel free to weigh in more here if desired.

the LLM's summary sounds a lot harsher than my standing on this feels 👀

And yeah, I think it's mostly important to consider the agents-sdk registering lk topics.
Potentially this could also be other plugins of course that would do this.

my take on that is: if we gonna break that we should break that forever for all lk.* topics rather than re-breaking for each small feature 😄

It's early Mon morning, but I think LLM is exaggerating that as well 👍

@pblazej The thing that I am still a bit conflicted on is that while I think it would be good to reserve the lk.* prefix for internal client sdk use, the agents sdk uses this prefix already - and since the agents sdk is indistinguishable to a client sdk from an end user's application, there's not really a good way to leave the interface open that other clients couldn't also take advantage of.

So I think we'd either need some way for a client on registration to be able to indicate it's actually a "first party livekit client" or something (brought this up to lukas last week, he also was not a fan of this), or the filtering has to either be by topic or just outright removed.

Other possible idea for data streams at least - add logic so that internal sdk uses can register a data stream handle via a different way, and which this other mechanism is uses, the data is "tee"d and sent to both the external handler and the other internal sdk feature, so an external consumer can add / remove a handler and no matter what happens they can't break internal sdk features.

pblazej · 2026-05-04T10:24:17Z

+            let destination = Participant.Identity(from: "v2-destination")
+            try await RpcTestSupport.installV2Remote(in: room, identity: destination)
+
+            // Do nothing in response — let the connection timeout (7s max round-trip) fire


P3: Nit: comment says "let the connection timeout (7s max round-trip) fire", but with responseTimeout: 0.05 the outer withThrowingTimeout fires first — the 7 s ack-timeout path isn't actually exercised here. Worth either fixing the comment or splitting into two tests (one for the outer timeout mapping, one for the actual ack-timeout).

I removed the misleading comment here: 6eb7c9b

Note that there already is another test exercising the 15s default (see 53285d3) and unless mocking global sleeps is fairly easily doable, I'm not sure that an additional test makes sense.

pblazej · 2026-05-04T10:36:17Z

Btw: the skills from AGENTS.md may help here

…est is send, not before Because of this, a very fast response could come back before the ack was registered leaving a garbage ack gumming up the works.

Ensure that the timeout and the completion can't accidentally fire the same callback multiple times.

… payloads, NOT v2 rpc requests

…onnect Otherwise, it is possible for messages to be dropped since there is a window of time during the Room construction where the rpc client / server managers aren't yet fully initialized, and a connect could occur during this period.

1. reservedPrefixRejectedOnRegisterAndUnregister — asserts that both register{Text,Byte}StreamHandler and unregister{Text,Byte}StreamHandler throw LiveKitError on any lk.* topic, closing the previously-flagged gap where users could silently disable internal RPC v2 dispatch by unregistering lk.rpc_request. 2. performRpcCleansUpOnCancellation — starts a performRpc that hangs (no mock response), waits for pending state to register (pendingCount == 1), cancels the awaiting Task, asserts the call throws and that pending state is cleared (pendingCount == 0). 3. performRpcConcurrentRequestsHaveDistinctIds — runs 5 concurrent performRpc calls to the same destination, asserts each gets a distinct requestId (Set(observedIds).count == 5) and that no pending state leaks afterward. 4. v2ResponseStreamFromWrongSenderIsIgnored — installs a v2 remote, performs an RPC, then injects a v2 response stream from a different identity (v2-imposter). Asserts the spoofed response is ignored and the call ultimately times out with connectionTimeout rather than resolving with the spoofed payload.

pblazej · 2026-05-05T10:14:15Z

Solid second pass — P0/P1 all closed with deterministic tests, the AsyncCompleter rewrite came out cleaner than I sketched (the publish-failure cleanup branch is a nice touch I didn't suggest) 🎉.

A handful of polish items still outstanding:

P2:

AsyncFlag / CapturedStreamHeader still in RpcTests.swift — the confirmation() swap would let both helpers go away.
Task.sleep still sprinkled across the new tests (~23 calls); most simulate fake network latency inside mock closures and could just be sequential awaits through the actor.
requireRoom() still throws RpcError.builtIn(.applicationError) instead of LiveKitError(.invalidState) — minor caller-visible behavior change.
stray print at RpcClientManager.swift:64.
minEffectiveTimeout = 1 (s) still differs from JS's 8 s floor — wire-compat nit only.

P3: parameterize the v2 handler error tests, try #require instead of ?? "", installV1Remote / installV2Remote rename, redundant size pre-check at publishRequest, log \(error) in dispatchToHandler, ClientProtocol.description debug form, ObjC parity decision on the @objc annotation.

Bigger thinking-out-loud item — unit vs e2e: the v2 suite has grown a lot of mock-side plumbing (MockDataChannelPair, manual _state.mutate, direct rpcClient.handleIncoming… calls) that mostly fakes what withRooms([...]) gives us for free against livekit-server --dev. Could we convert the happy-path v2 tests to withRooms like the v1 performRpc test + RpcObjCTests.m, and keep the mock variant only for cases that genuinely need internal __test_* hooks?

…ayload

…PassthroughViaPacket into a single parameterized test

…ng infrastructure

pblazej · 2026-05-12T11:45:11Z

@1egoman I don't see anything "unresolved" atm, let me take a final read and resolve the conflicts 🌮

BTW any manual (x-platform) tests worth performing now?

1egoman · 2026-05-12T13:14:10Z

BTW any manual (x-platform) tests worth performing now?

@pblazej A few platform combinations things that come to mind: 1) old client-sdk-js (ie, current npm release) <-> new swift, 2) new client-sdk-js (ie, livekit/client-sdk-js#1832) <-> new swift. Going through all the happy path cases in the spec document I think are worthwhile at a minimum.

feat: commit initial rpc v2 code

071fd40

1egoman requested a review from pblazej April 28, 2026 21:09

1egoman added 2 commits April 29, 2026 15:58

fix: ensure client protocol is advertised in both single and dual pee…

e441323

…r connection cases

refactor: move CLIENT_PROTOCOL_* constants into ClientProtocol enum

dc224f2

1egoman marked this pull request as ready for review April 30, 2026 20:06

1egoman requested a review from hiroshihorie April 30, 2026 20:08

1egoman and others added 2 commits April 30, 2026 16:53

refactor: push more of the rpc logic down into the RpcServerManager /…

d4e4c8d

… RpcClientManager layers

Merge branch 'main' into rpc-v2-data-streams

cf0cbb8

pblazej reviewed May 4, 2026

View reviewed changes

Comment thread Sources/LiveKit/Protos/livekit_models.pb.swift

pblazej reviewed May 4, 2026

View reviewed changes

pblazej requested changes May 4, 2026

View reviewed changes

1egoman added 9 commits May 4, 2026 15:11

feat: update protocol submodule + build protos

a1034df

fix: remove trailing comma

1ce45a7

feat: address race condition with ack being registered after rpc requ…

24909cb

…est is send, not before Because of this, a very fast response could come back before the ack was registered leaving a garbage ack gumming up the works.

feat: add tests for timeout / response double firing case

68d17d9

Ensure that the timeout and the completion can't accidentally fire the same callback multiple times.

feat: ensure that 15kb limit only applies to receiving v1 rpc request…

bf39ab1

… payloads, NOT v2 rpc requests

feat: add official hard limit on users using lk.-prefixed data streams

930e636

feat: clean up rpc requests when a participant disconnects

ea9dde7

1egoman and others added 4 commits May 5, 2026 16:58

feat: replace custom AsyncFlag with swift confirmation

684a410

fix: remove excess sleeps in tests

8b19c9c

fix: remove stray print

8fb93ee

fix: add assert of exact RpcError code

5b604a4

1egoman added 10 commits May 6, 2026 15:51

fix: address incorrect errors when room is nil

1cdfc07

refactor: simplify do/catch in test

8c752b7

fix: update timeout semantics of RPC to match web

53285d3

fix: adjust test assertion to check exact equality of response/largeP…

2475356

…ayload

refactor: installV1Remote/installV2Remote -> installRemote

1ecec2b

fix: add error to log

5301dd4

fix: remove @objc from ClientProtocol

e6a554a

fix: convert ?? "" -> try to catch errors earlier in tests

d84eb0b

fix: adjust ClientProtocol description to be more debug friendly

00c4eaf

fix: combine v2HandlerUnhandledErrorReturnsPacket / v2HandlerRpcError…

bcf1d38

…PassthroughViaPacket into a single parameterized test

1egoman requested a review from xianshijing-lk as a code owner May 7, 2026 14:54

1egoman added 4 commits May 7, 2026 11:02

fix: remove misleading comment

6eb7c9b

fix: remove duplicate max rpc payload length check

d1e25b4

fix: ensure raw client protocol value sent in signalling path

813583a

feat: add initial pass at porting existing rpc tests to use e2e testi…

932b071

…ng infrastructure

Conversation

1egoman commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pblazej commented Apr 29, 2026

Uh oh!

1egoman commented Apr 30, 2026

Uh oh!

Uh oh!

pblazej May 4, 2026

Choose a reason for hiding this comment

Uh oh!

1egoman May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pblazej left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pblazej May 4, 2026

Choose a reason for hiding this comment

Uh oh!

1egoman May 4, 2026

Choose a reason for hiding this comment

Uh oh!

1egoman May 7, 2026

Choose a reason for hiding this comment

Uh oh!

lukasIO May 7, 2026

Choose a reason for hiding this comment

Uh oh!

pblazej May 11, 2026

Choose a reason for hiding this comment

Uh oh!

1egoman May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pblazej May 4, 2026

Choose a reason for hiding this comment

Uh oh!

1egoman May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pblazej commented May 4, 2026

Uh oh!

pblazej commented May 5, 2026

Uh oh!

pblazej commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

1egoman commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1egoman commented Apr 28, 2026 •

edited

Loading

github-actions Bot commented Apr 28, 2026 •

edited

Loading

1egoman May 4, 2026 •

edited

Loading

1egoman May 11, 2026 •

edited

Loading

1egoman May 7, 2026 •

edited

Loading

pblazej commented May 12, 2026 •

edited

Loading