[https://nvbugs/5911304][fix] Add URL validation for media input loading#13680
[https://nvbugs/5911304][fix] Add URL validation for media input loading#13680yibinl-nvidia wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
2d95873 to
3001dbf
Compare
|
/bot run |
…l fetches User-supplied image_url/video_url/audio_url values were fetched via aiohttp without scheme/IP validation, response size limits, or redirect-chain validation. This allowed SSRF to private/loopback/cloud-metadata addresses, DoS via oversized responses, and SSRF via redirect. Add _validate_public_url() and _safe_aiohttp_get() helpers that: - reject non-http(s) schemes and URLs resolving to non-global addresses - validate every redirect hop before following - cap response size at 200 MB and redirect chain at 5 hops Apply to async_load_image, async_load_video, and async_load_audio (the serving API surface). Sync paths and the file/data branches are unchanged. Addresses NVBugs 5911304. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
3001dbf to
1b36856
Compare
|
/bot run |
|
PR_Github #46459 [ run ] triggered by Bot. Commit: |
|
PR_Github #46459 [ run ] completed with state
|
📝 WalkthroughWalkthroughIntroduces SSRF and DoS protections for multimodal HTTP(S) fetching by implementing URL validation that rejects non-HTTP(S) schemes, requires resolvable hostnames, and blocks non-global IPs. Replaces direct downloads with a safe HTTP client that validates URLs at each hop, disables redirects by default, manually follows a limited set of redirect status codes, enforces maximum response size, and caps redirect loops. Adds comprehensive test coverage and CI integration. Changes
Sequence DiagramsequenceDiagram
actor Client
participant Validator as URL Validator
participant Resolver as DNS/IP Resolver
participant HTTPClient as HTTP Client
participant Response as Response Handler
Client->>Validator: Request URL (validate_public_url)
Validator->>Validator: Check scheme (http/https only)
Validator->>Validator: Extract hostname
alt Invalid scheme or missing hostname
Validator-->>Client: Raise error
end
Validator->>Resolver: Resolve hostname to IP
alt Cannot resolve
Resolver-->>Validator: Raise error
Validator-->>Client: Raise error
end
Validator->>Validator: Check IP is public/global
alt Private, loopback, or IMDS IP
Validator-->>Client: Raise error
end
Validator-->>Client: URL valid ✓
Client->>HTTPClient: _safe_aiohttp_get (redirects disabled)
HTTPClient->>HTTPClient: Validate resolved URL
alt URL invalid
HTTPClient-->>Client: Raise error
end
HTTPClient->>Response: Fetch with no redirects
alt Redirect response (301-308)
Response->>Validator: Validate redirect target
alt Target invalid
Validator-->>Client: Raise error
end
alt Redirect count exceeds max
Response-->>Client: Raise "Too many redirects"
end
Response->>HTTPClient: Follow redirect (manual)
end
Response->>Response: Stream response body
alt Response exceeds max size
Response-->>Client: Raise error
end
Response-->>Client: Return response bytes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Review rate limit: 9/10 reviews remaining, refill in 6 minutes. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
tests/unittest/inputs/test_url_validation.py (1)
1-15: ⚡ Quick winAdd the required NVIDIA file header.
This new test module is missing the standard NVIDIA copyright/license header. As per coding guidelines, "All source files (.cpp, .h, .cu, .py) should contain an NVIDIA copyright header with the year of latest modification and Apache 2.0 license notice."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unittest/inputs/test_url_validation.py` around lines 1 - 15, Add the standard NVIDIA copyright/license header to the top of this new test module (tests/unittest/inputs/test_url_validation.py); include the year of latest modification and the full Apache 2.0 notice required for all source files, placing it above the module docstring so the file begins with the NVIDIA header followed by the existing docstring and imports; no code changes to _MAX_RESPONSE_BYTES, _safe_aiohttp_get, or _validate_public_url are needed—only add the required header text.tensorrt_llm/inputs/utils.py (1)
1-13: ⚡ Quick winAdd the required NVIDIA file header.
This modified Python file still lacks the repository-required copyright/license header at the top. As per coding guidelines, "All source files (.cpp, .h, .cu, .py) should contain an NVIDIA copyright header with the year of latest modification and Apache 2.0 license notice."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/inputs/utils.py` around lines 1 - 13, Add the required NVIDIA copyright/license header at the very top of the Python source (above the first import statement "import asyncio") so the file contains the repo-standard NVIDIA header with the latest modification year and Apache-2.0 notice; insert the exact header text used across the repo (copy from another source file) and ensure it precedes the module-level imports and any module-level docstring or code in this file (symbols to look for: the file's top-level imports such as "import asyncio", "import base64", etc.).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tensorrt_llm/inputs/utils.py`:
- Around line 148-160: Add explicit connect and read timeouts to the HTTP
request to prevent stalled or trickling responses from exhausting the session:
when calling session.get(...) (the request around the async with
session.get(current, allow_redirects=False) block) pass an aiohttp timeout
(ClientTimeout) that sets reasonable sock_connect and sock_read (and optionally
total) values, and ensure those timeouts apply to redirects handling; keep the
existing response size guard around response.content.iter_chunked and raise as
before if exceeded.
- Around line 143-148: The SSRF validation in _safe_aiohttp_get is vulnerable to
DNS rebinding because _validate_public_url(current) checks the hostname but
session.get(current) performs its own DNS resolution; fix by resolving the
hostname once via _validate_public_url (or a new resolver helper) to obtain the
concrete IP/port and then ensure the aiohttp request uses that same IP for the
TCP connection (e.g., create a custom aiohttp.TCPConnector/Resolver that pins
the resolved IP or call session.get() with the resolved IP and set the Host
header to the original hostname). Update _safe_aiohttp_get to pass the pinned IP
into the connector/request so DNS cannot be re-resolved, and keep redirect
validation logic (including calls to _validate_public_url and _MAX_REDIRECTS)
intact.
In `@tests/unittest/inputs/test_url_validation.py`:
- Around line 88-93: The test double _FakeSession.get currently ignores kwargs
so regressions where _safe_aiohttp_get stops passing allow_redirects=False won’t
be caught; update _FakeSession.get to assert that kwargs contains
allow_redirects set to False (or raise/record an error if missing/True) before
returning the next response so the unit test enforces the security contract and
fails if _safe_aiohttp_get no longer passes allow_redirects=False.
---
Nitpick comments:
In `@tensorrt_llm/inputs/utils.py`:
- Around line 1-13: Add the required NVIDIA copyright/license header at the very
top of the Python source (above the first import statement "import asyncio") so
the file contains the repo-standard NVIDIA header with the latest modification
year and Apache-2.0 notice; insert the exact header text used across the repo
(copy from another source file) and ensure it precedes the module-level imports
and any module-level docstring or code in this file (symbols to look for: the
file's top-level imports such as "import asyncio", "import base64", etc.).
In `@tests/unittest/inputs/test_url_validation.py`:
- Around line 1-15: Add the standard NVIDIA copyright/license header to the top
of this new test module (tests/unittest/inputs/test_url_validation.py); include
the year of latest modification and the full Apache 2.0 notice required for all
source files, placing it above the module docstring so the file begins with the
NVIDIA header followed by the existing docstring and imports; no code changes to
_MAX_RESPONSE_BYTES, _safe_aiohttp_get, or _validate_public_url are needed—only
add the required header text.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 49c68b50-2fcd-4d82-a99c-098b8e0b275c
📒 Files selected for processing (3)
tensorrt_llm/inputs/utils.pytests/integration/test_lists/test-db/l0_a10.ymltests/unittest/inputs/test_url_validation.py
| async def _safe_aiohttp_get(url: str, session: aiohttp.ClientSession) -> bytes: | ||
| """Fetch *url*, validating each redirect hop and capping response size.""" | ||
| current = url | ||
| for _ in range(_MAX_REDIRECTS + 1): | ||
| await asyncio.to_thread(_validate_public_url, current) | ||
| async with session.get(current, allow_redirects=False) as response: |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n tensorrt_llm/inputs/utils.py | head -200Repository: NVIDIA/TensorRT-LLM
Length of output: 9041
🏁 Script executed:
python3 << 'EOF'
# Research aiohttp's DNS handling behavior
import aiohttp
import inspect
# Check what connector aiohttp uses by default
session = aiohttp.ClientSession()
print("Default connector:", session.connector)
print("Connector type:", type(session.connector))
# Check if there's DNS caching
try:
print("\nConnector attributes:")
print(dir(session.connector))
except Exception as e:
print(f"Error: {e}")
EOFRepository: NVIDIA/TensorRT-LLM
Length of output: 354
🏁 Script executed:
# Check if there are any DNS-related configurations or mitigations in the codebase
rg -i "dns|resolver|connector|resolve" tensorrt_llm/inputs/utils.pyRepository: NVIDIA/TensorRT-LLM
Length of output: 702
🏁 Script executed:
# Search for aiohttp usage patterns in the codebase
rg -A 5 -B 5 "aiohttp.ClientSession\|session.get\|session.post" tensorrt_llm/inputs/utils.py | head -100Repository: NVIDIA/TensorRT-LLM
Length of output: 45
🏁 Script executed:
# Search for aiohttp documentation or comments about DNS handling
rg -i "resolver|connector" tensorrt_llm/ | grep -E "\.py:" | head -20Repository: NVIDIA/TensorRT-LLM
Length of output: 2032
🏁 Script executed:
# Check if there are any custom connectors or resolver configurations
fd -e py -x grep -l "TCPConnector\|resolver\|Resolver" {} \; | head -10Repository: NVIDIA/TensorRT-LLM
Length of output: 537
🏁 Script executed:
# Look at aiohttp's connector behavior - check what version is used
grep -r "aiohttp" tensorrt_llm/requirements.txt pyproject.toml setup.py 2>/dev/null | head -20Repository: NVIDIA/TensorRT-LLM
Length of output: 45
🏁 Script executed:
# Check if there's a requirements file with aiohttp version
find . -name "requirements*.txt" -o -name "pyproject.toml" -o -name "setup.py" | head -5 | xargs grep -h "aiohttp" 2>/dev/null | head -10Repository: NVIDIA/TensorRT-LLM
Length of output: 45
🏁 Script executed:
# Search for any DNS rebinding or SSRF-related security discussions in the repo
rg -i "dns.rebind|rebind|ssrf" tensorrt_llm/ --type py | head -20Repository: NVIDIA/TensorRT-LLM
Length of output: 647
🏁 Script executed:
# Look at the full context around the _safe_aiohttp_get function to see if there are other mitigations
sed -n '140,165p' tensorrt_llm/inputs/utils.pyRepository: NVIDIA/TensorRT-LLM
Length of output: 1165
🏁 Script executed:
# Check if there's any DNS resolver configuration for the aiohttp session creation
rg -B 10 -A 10 "_get_aiohttp_session" tensorrt_llm/inputs/utils.pyRepository: NVIDIA/TensorRT-LLM
Length of output: 3308
🏁 Script executed:
# Research if aiohttp has built-in DNS caching that would prevent rebinding
python3 << 'EOF'
# Check aiohttp documentation for DNS behavior
# By default, aiohttp.TCPConnector uses asyncio's getaddrinfo with no explicit DNS caching
# This means each connection attempt will perform a new DNS lookup
# The vulnerability window:
# 1. Line 147: socket.getaddrinfo() validates the hostname
# 2. Between lines 147-148: DNS TTL expires or cache is cleared
# 3. Line 148: aiohttp.session.get() performs another getaddrinfo() which gets a different IP
# This is a classic DNS rebinding attack vector:
# Attacker controls dns.example.com
# Step 1 (validation): dns.example.com → 8.8.8.8 (public IP) ✓ passes validation
# Step 2 (connection): dns.example.com → 127.0.0.1 (private IP) ✗ bypasses validation
print("DNS Rebinding Vulnerability Analysis:")
print("=" * 50)
print("The vulnerability is technically valid because:")
print("1. socket.getaddrinfo() and aiohttp's resolver are separate")
print("2. No DNS result pinning between validation and connection")
print("3. Attacker can control DNS responses with low TTL")
print("4. TOCTOU (Time-of-check to time-of-use) vulnerability")
EOFRepository: NVIDIA/TensorRT-LLM
Length of output: 425
🏁 Script executed:
# Check if there are any mitigations or discussions about this in comments or issues
rg -B 5 -A 5 "NVBugs 5911304" tensorrt_llm/inputs/utils.pyRepository: NVIDIA/TensorRT-LLM
Length of output: 344
🏁 Script executed:
# Verify the actual implementation to confirm the vulnerability is real
sed -n '122,162p' tensorrt_llm/inputs/utils.pyRepository: NVIDIA/TensorRT-LLM
Length of output: 1899
DNS rebinding allows bypassing the SSRF validation.
Line 147 validates the hostname resolves to a public IP, but line 148's session.get() performs a separate DNS resolution that could return a different address. An attacker controlling DNS can return a public address during validation and a private/loopback address during connect, defeating the SSRF protection. The validated IP must be pinned to the actual socket connection via a custom connector, or the request must use the resolved address directly instead of the hostname.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tensorrt_llm/inputs/utils.py` around lines 143 - 148, The SSRF validation in
_safe_aiohttp_get is vulnerable to DNS rebinding because
_validate_public_url(current) checks the hostname but session.get(current)
performs its own DNS resolution; fix by resolving the hostname once via
_validate_public_url (or a new resolver helper) to obtain the concrete IP/port
and then ensure the aiohttp request uses that same IP for the TCP connection
(e.g., create a custom aiohttp.TCPConnector/Resolver that pins the resolved IP
or call session.get() with the resolved IP and set the Host header to the
original hostname). Update _safe_aiohttp_get to pass the pinned IP into the
connector/request so DNS cannot be re-resolved, and keep redirect validation
logic (including calls to _validate_public_url and _MAX_REDIRECTS) intact.
| async with session.get(current, allow_redirects=False) as response: | ||
| if response.status in _REDIRECT_STATUSES: | ||
| current = urljoin(current, response.headers.get("Location", "")) | ||
| continue | ||
| response.raise_for_status() | ||
| buf = BytesIO() | ||
| total = 0 | ||
| async for chunk in response.content.iter_chunked(1 << 20): | ||
| total += len(chunk) | ||
| if total > _MAX_RESPONSE_BYTES: | ||
| raise RuntimeError("Response exceeds maximum allowed size") | ||
| buf.write(chunk) | ||
| return buf.getvalue() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
fd -t f "utils.py" | grep inputsRepository: NVIDIA/TensorRT-LLM
Length of output: 92
🏁 Script executed:
cat tensorrt_llm/inputs/utils.pyRepository: NVIDIA/TensorRT-LLM
Length of output: 45677
Add timeout parameters to prevent DoS from stalled or trickling remote servers.
The code caps response bytes but lacks connect/read timeouts on the aiohttp session. A hostile server can indefinitely stall the handshake or trickle chunks to keep the shared session occupied, making the DoS mitigation incomplete.
Suggested fix
+# Keep slow or stalled remote media fetches from hanging forever.
+_AIOHTTP_TIMEOUT = aiohttp.ClientTimeout(
+ total=30,
+ connect=10,
+ sock_connect=10,
+ sock_read=10,
+)
+
async def _safe_aiohttp_get(url: str, session: aiohttp.ClientSession) -> bytes:
"""Fetch *url*, validating each redirect hop and capping response size."""
current = url
for _ in range(_MAX_REDIRECTS + 1):
await asyncio.to_thread(_validate_public_url, current)
- async with session.get(current, allow_redirects=False) as response:
+ async with session.get(
+ current,
+ allow_redirects=False,
+ timeout=_AIOHTTP_TIMEOUT) as response:
if response.status in _REDIRECT_STATUSES:
current = urljoin(current, response.headers.get("Location", ""))
continue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tensorrt_llm/inputs/utils.py` around lines 148 - 160, Add explicit connect
and read timeouts to the HTTP request to prevent stalled or trickling responses
from exhausting the session: when calling session.get(...) (the request around
the async with session.get(current, allow_redirects=False) block) pass an
aiohttp timeout (ClientTimeout) that sets reasonable sock_connect and sock_read
(and optionally total) values, and ensure those timeouts apply to redirects
handling; keep the existing response size guard around
response.content.iter_chunked and raise as before if exceeded.
| class _FakeSession: | ||
| def __init__(self, responses): | ||
| self._responses = list(responses) | ||
|
|
||
| def get(self, url, **kwargs): | ||
| return self._responses.pop(0) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, find the file
find . -name "test_url_validation.py" -type fRepository: NVIDIA/TensorRT-LLM
Length of output: 110
🏁 Script executed:
# Read the file to understand the context
cat -n tests/unittest/inputs/test_url_validation.py | head -120Repository: NVIDIA/TensorRT-LLM
Length of output: 4831
🏁 Script executed:
# Search for _safe_aiohttp_get to understand how it's being tested
rg -n "_safe_aiohttp_get" tests/unittest/inputs/test_url_validation.py -A 5 -B 2Repository: NVIDIA/TensorRT-LLM
Length of output: 2184
🏁 Script executed:
# Find the utils file containing _safe_aiohttp_get
find . -name "utils.py" -path "*/inputs/*" -type fRepository: NVIDIA/TensorRT-LLM
Length of output: 94
🏁 Script executed:
# Read the implementation
cat -n tensorrt_llm/inputs/utils.py | grep -A 30 "def _safe_aiohttp_get"Repository: NVIDIA/TensorRT-LLM
Length of output: 1524
Add assertion to test double enforcing allow_redirects=False.
_FakeSession.get() currently ignores kwargs, so tests won't catch a regression if _safe_aiohttp_get() stops passing allow_redirects=False. This parameter is critical to the security contract: it ensures per-hop URL validation instead of allowing aiohttp to auto-follow redirects unsafely, defeating SSRF protections.
Tighten the test double
class _FakeSession:
def __init__(self, responses):
self._responses = list(responses)
def get(self, url, **kwargs):
+ assert kwargs.get("allow_redirects") is False
return self._responses.pop(0)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| class _FakeSession: | |
| def __init__(self, responses): | |
| self._responses = list(responses) | |
| def get(self, url, **kwargs): | |
| return self._responses.pop(0) | |
| class _FakeSession: | |
| def __init__(self, responses): | |
| self._responses = list(responses) | |
| def get(self, url, **kwargs): | |
| assert kwargs.get("allow_redirects") is False | |
| return self._responses.pop(0) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unittest/inputs/test_url_validation.py` around lines 88 - 93, The test
double _FakeSession.get currently ignores kwargs so regressions where
_safe_aiohttp_get stops passing allow_redirects=False won’t be caught; update
_FakeSession.get to assert that kwargs contains allow_redirects set to False (or
raise/record an error if missing/True) before returning the next response so the
unit test enforces the security contract and fails if _safe_aiohttp_get no
longer passes allow_redirects=False.
Summary by CodeRabbit
Release Notes
New Features
Tests
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.