feat(client): round-robin outbound connections across source IPs#370
feat(client): round-robin outbound connections across source IPs#370viraatc wants to merge 1 commit into
Conversation
Streaming LLM benchmarks hold one TCP connection per in-flight request for the request's whole duration, so concurrency on a single client host is capped by the OS ephemeral-port range (default 32768-60999, ~28k ports). Past that ceiling new connections fail with EADDRNOTAVAIL, and the wait for a port to free is charged to TTFT rather than surfaced as an error. TCP 4-tuple uniqueness (src_ip, src_port, dst_ip, dst_port) is per source IP, so binding outbound connections across N local source IPs gives N independent ephemeral-port spaces to the same destination -- multiplying the usable connection budget by ~N on a single host (multiple NICs, or 127.0.0.0/8 aliases on Linux). - config: add `source_ips` (default empty = unchanged OS default source selection); scale the ephemeral-port budget clamp by the source-IP count. - http: ConnectionPool round-robins `local_addr` across `source_ips` per new connection; empty list is normalized to None (no hot-path overhead). - worker: thread `source_ips` from HTTPClientConfig into each per-worker pool. Verified: 8 unit tests (round-robin binding + budget scaling), plus an OS-level check showing 5 source IPs yield exactly 5x established connections to a single destination, the same client port coexisting across source IPs, and EADDRNOTAVAIL raised only when one IP's port window is exhausted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
There was a problem hiding this comment.
Code Review
This pull request introduces the ability to bind outbound connections to multiple client source IPs in a round-robin fashion, multiplying the ephemeral-port budget to support higher concurrency. It updates the HTTP client configuration, connection pool, and worker to support this feature, and adds corresponding unit tests. The feedback suggests deduplicating the provided source IPs to prevent overestimating the ephemeral port budget, along with updating the unit tests to verify this behavior.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| @field_validator("source_ips") | ||
| @classmethod | ||
| def _clean_source_ips(cls, v: list[str]) -> list[str]: | ||
| # Drop blank entries so a stray "" can't bind to INADDR_ANY (all interfaces). | ||
| return [ip.strip() for ip in v if ip and ip.strip()] |
There was a problem hiding this comment.
If duplicate IP addresses are provided in --source-ips (e.g., ["127.0.0.1", "127.0.0.1"]), they are not filtered out. This will cause len(self.source_ips) to count duplicates, leading to an incorrect (overestimated) ephemeral port budget calculation in _resolve_defaults.
Deduplicating the list while preserving order using dict.fromkeys ensures the budget is scaled accurately.
| @field_validator("source_ips") | |
| @classmethod | |
| def _clean_source_ips(cls, v: list[str]) -> list[str]: | |
| # Drop blank entries so a stray "" can't bind to INADDR_ANY (all interfaces). | |
| return [ip.strip() for ip in v if ip and ip.strip()] | |
| @field_validator("source_ips") | |
| @classmethod | |
| def _clean_source_ips(cls, v: list[str]) -> list[str]: | |
| # Drop blank entries so a stray "" can't bind to INADDR_ANY (all interfaces). | |
| # Also deduplicate to avoid overestimating the ephemeral port budget. | |
| return list(dict.fromkeys(ip.strip() for ip in v if ip and ip.strip())) |
| def test_blank_source_ips_are_dropped(self): | ||
| c = cfg.HTTPClientConfig(source_ips=["127.0.0.1", " ", "", "127.0.0.2"]) | ||
| assert c.source_ips == ["127.0.0.1", "127.0.0.2"] |
There was a problem hiding this comment.
Update the test to verify that duplicate source IPs are also correctly deduplicated and cleaned.
| def test_blank_source_ips_are_dropped(self): | |
| c = cfg.HTTPClientConfig(source_ips=["127.0.0.1", " ", "", "127.0.0.2"]) | |
| assert c.source_ips == ["127.0.0.1", "127.0.0.2"] | |
| def test_blank_and_duplicate_source_ips_are_cleaned(self): | |
| c = cfg.HTTPClientConfig(source_ips=["127.0.0.1", " ", "", "127.0.0.1", "127.0.0.2"]) | |
| assert c.source_ips == ["127.0.0.1", "127.0.0.2"] |
|
Superseded by #371. Follow-up investigation showed the ephemeral-port budget is already multiplied across distinct destinations by the kernel — the limit is per The real fix is just to stop the auto |
Summary
Streaming inference benchmarks hold one TCP connection per in-flight request for the request's whole duration. On a single client host, concurrency is therefore capped by the OS ephemeral-port range (Linux default
32768–60999, ~28k ports). Past that ceiling new connections fail withEADDRNOTAVAIL, and because the client waits for a port to free, the delay is absorbed into TTFT rather than surfaced as an error — an easy-to-miss throughput wall at high concurrency.TCP 4-tuple uniqueness
(src_ip, src_port, dst_ip, dst_port)is per source IP, so binding outbound connections across N local source IPs gives N independent ephemeral-port spaces to the same destination — multiplying the usable connection budget by ~N on a single host. Hosts with multiple NICs already have several usable source IPs; on Linux the entire127.0.0.0/8loopback block is usable without configuration.Changes
config.py— newsource_ips: list[str]option (default empty = current OS-default behavior). The ephemeral-port budget clamp is scaled by the source-IP count, somax_connections=-1(auto) resolves toavailable_ports × len(source_ips), and an explicit--max-connectionsis validated against the scaled budget. Blank entries are dropped so a stray""can't bind toINADDR_ANY.http.py—ConnectionPoolround-robinslocal_addracrosssource_ipson each new connection. Empty list is normalized toNone, so the default path keeps zero overhead and unchanged behavior.worker.py— threadssource_ipsfromHTTPClientConfiginto each per-worker pool.Usage
Round-robins outbound connections across the three IPs, ~tripling the single-host connection ceiling. Empty (default) preserves today's behavior exactly.
Verification
local_addrassignment + empty/None normalization (test_http.py); budget scaling, explicit-budget validation, and blank-IP cleaning (test_http_client_config.py).EADDRNOTAVAIL— i.e. spreading across source IPs removes the single-IP ceiling.Notes
source_ipsis byte-for-byte the prior behavior (kernel autobind, single ephemeral-port space).127.0.0.0/8aliases on Linux); otherwisebind()fails with a clear error.config/templates/*_full.yaml) are intentionally not modified in this PR.🤖 Generated with Claude Code