Skip to content

feat(client): round-robin outbound connections across source IPs#370

Closed
viraatc wants to merge 1 commit into
mainfrom
feat/endpoint-client-source-ips
Closed

feat(client): round-robin outbound connections across source IPs#370
viraatc wants to merge 1 commit into
mainfrom
feat/endpoint-client-source-ips

Conversation

@viraatc

@viraatc viraatc commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Summary

Streaming inference benchmarks hold one TCP connection per in-flight request for the request's whole duration. On a single client host, concurrency is therefore capped by the OS ephemeral-port range (Linux default 32768–60999, ~28k ports). Past that ceiling new connections fail with EADDRNOTAVAIL, and because the client waits for a port to free, the delay is absorbed into TTFT rather than surfaced as an error — an easy-to-miss throughput wall at high concurrency.

TCP 4-tuple uniqueness (src_ip, src_port, dst_ip, dst_port) is per source IP, so binding outbound connections across N local source IPs gives N independent ephemeral-port spaces to the same destination — multiplying the usable connection budget by ~N on a single host. Hosts with multiple NICs already have several usable source IPs; on Linux the entire 127.0.0.0/8 loopback block is usable without configuration.

Changes

  • config.py — new source_ips: list[str] option (default empty = current OS-default behavior). The ephemeral-port budget clamp is scaled by the source-IP count, so max_connections=-1 (auto) resolves to available_ports × len(source_ips), and an explicit --max-connections is validated against the scaled budget. Blank entries are dropped so a stray "" can't bind to INADDR_ANY.
  • http.pyConnectionPool round-robins local_addr across source_ips on each new connection. Empty list is normalized to None, so the default path keeps zero overhead and unchanged behavior.
  • worker.py — threads source_ips from HTTPClientConfig into each per-worker pool.

Usage

--source-ips 10.0.0.1 --source-ips 10.0.0.2 --source-ips 10.0.0.3

Round-robins outbound connections across the three IPs, ~tripling the single-host connection ceiling. Empty (default) preserves today's behavior exactly.

Verification

  • Unit tests (8 new): round-robin local_addr assignment + empty/None normalization (test_http.py); budget scaling, explicit-budget validation, and blank-IP cleaning (test_http_client_config.py).
  • OS-level proof: with an emulated small ephemeral window, a single source IP established exactly W connections to one destination; 5 source IPs established exactly 5W (5.00× measured). Confirmed the same client port coexists across source IPs, and that exhausting a single IP's window raises EADDRNOTAVAIL — i.e. spreading across source IPs removes the single-IP ceiling.

Notes

  • Backward compatible: empty source_ips is byte-for-byte the prior behavior (kernel autobind, single ephemeral-port space).
  • IPs must be assigned to local interfaces (multiple NICs, or 127.0.0.0/8 aliases on Linux); otherwise bind() fails with a clear error.
  • Generated full-config templates (config/templates/*_full.yaml) are intentionally not modified in this PR.

🤖 Generated with Claude Code

Streaming LLM benchmarks hold one TCP connection per in-flight request for the
request's whole duration, so concurrency on a single client host is capped by
the OS ephemeral-port range (default 32768-60999, ~28k ports). Past that ceiling
new connections fail with EADDRNOTAVAIL, and the wait for a port to free is
charged to TTFT rather than surfaced as an error.

TCP 4-tuple uniqueness (src_ip, src_port, dst_ip, dst_port) is per source IP, so
binding outbound connections across N local source IPs gives N independent
ephemeral-port spaces to the same destination -- multiplying the usable
connection budget by ~N on a single host (multiple NICs, or 127.0.0.0/8 aliases
on Linux).

- config: add `source_ips` (default empty = unchanged OS default source
  selection); scale the ephemeral-port budget clamp by the source-IP count.
- http: ConnectionPool round-robins `local_addr` across `source_ips` per new
  connection; empty list is normalized to None (no hot-path overhead).
- worker: thread `source_ips` from HTTPClientConfig into each per-worker pool.

Verified: 8 unit tests (round-robin binding + budget scaling), plus an OS-level
check showing 5 source IPs yield exactly 5x established connections to a single
destination, the same client port coexisting across source IPs, and
EADDRNOTAVAIL raised only when one IP's port window is exhausted.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@viraatc viraatc requested a review from a team June 22, 2026 21:53
@github-actions

Copy link
Copy Markdown

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@github-actions github-actions Bot requested review from arekay-nv and nvzhihanj June 22, 2026 21:53

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the ability to bind outbound connections to multiple client source IPs in a round-robin fashion, multiplying the ephemeral-port budget to support higher concurrency. It updates the HTTP client configuration, connection pool, and worker to support this feature, and adds corresponding unit tests. The feedback suggests deduplicating the provided source IPs to prevent overestimating the ephemeral port budget, along with updating the unit tests to verify this behavior.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +246 to +250
@field_validator("source_ips")
@classmethod
def _clean_source_ips(cls, v: list[str]) -> list[str]:
# Drop blank entries so a stray "" can't bind to INADDR_ANY (all interfaces).
return [ip.strip() for ip in v if ip and ip.strip()]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If duplicate IP addresses are provided in --source-ips (e.g., ["127.0.0.1", "127.0.0.1"]), they are not filtered out. This will cause len(self.source_ips) to count duplicates, leading to an incorrect (overestimated) ephemeral port budget calculation in _resolve_defaults.

Deduplicating the list while preserving order using dict.fromkeys ensures the budget is scaled accurately.

Suggested change
@field_validator("source_ips")
@classmethod
def _clean_source_ips(cls, v: list[str]) -> list[str]:
# Drop blank entries so a stray "" can't bind to INADDR_ANY (all interfaces).
return [ip.strip() for ip in v if ip and ip.strip()]
@field_validator("source_ips")
@classmethod
def _clean_source_ips(cls, v: list[str]) -> list[str]:
# Drop blank entries so a stray "" can't bind to INADDR_ANY (all interfaces).
# Also deduplicate to avoid overestimating the ephemeral port budget.
return list(dict.fromkeys(ip.strip() for ip in v if ip and ip.strip()))

Comment on lines +54 to +56
def test_blank_source_ips_are_dropped(self):
c = cfg.HTTPClientConfig(source_ips=["127.0.0.1", " ", "", "127.0.0.2"])
assert c.source_ips == ["127.0.0.1", "127.0.0.2"]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Update the test to verify that duplicate source IPs are also correctly deduplicated and cleaned.

Suggested change
def test_blank_source_ips_are_dropped(self):
c = cfg.HTTPClientConfig(source_ips=["127.0.0.1", " ", "", "127.0.0.2"])
assert c.source_ips == ["127.0.0.1", "127.0.0.2"]
def test_blank_and_duplicate_source_ips_are_cleaned(self):
c = cfg.HTTPClientConfig(source_ips=["127.0.0.1", " ", "", "127.0.0.1", "127.0.0.2"])
assert c.source_ips == ["127.0.0.1", "127.0.0.2"]

@viraatc

viraatc commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator Author

Superseded by #371.

Follow-up investigation showed the ephemeral-port budget is already multiplied across distinct destinations by the kernel — the limit is per (source IP, destination) pair, so each endpoint gets its own budget (validated: 1 endpoint → 999 simultaneous connections, 5 endpoints → 4975, ~995 each). So explicit source-IP binding isn't needed for the common multi-frontend case.

The real fix is just to stop the auto max_connections clamp from capping total concurrency at a single pair's budget when multiple endpoints are configured. #371 does that with no new config or CLI — the multiplier comes from the endpoint_urls already specified. Closing this in favor of #371.

@viraatc viraatc closed this Jun 23, 2026
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 23, 2026
@viraatc viraatc deleted the feat/endpoint-client-source-ips branch June 23, 2026 02:22
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant