-
Notifications
You must be signed in to change notification settings - Fork 617
[SECURITY] Add input validation and CSP headers #1834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
24397c6
35f6eaf
d9c5770
1a46788
423c2e9
9a62502
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,31 @@ | ||||||||||||||||||||||||||||||||||||||||||||||||
| from django.http import HttpRequest, HttpResponse | ||||||||||||||||||||||||||||||||||||||||||||||||
| from django.utils.deprecation import MiddlewareMixin | ||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||
| class ContentSecurityPolicyMiddleware(MiddlewareMixin): | ||||||||||||||||||||||||||||||||||||||||||||||||
| """Middleware to add Content-Security-Policy header to all responses. | ||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||
| Since this is a JSON API backend, the policy is restrictive by default: | ||||||||||||||||||||||||||||||||||||||||||||||||
| only 'self' is allowed for all directives, and no inline scripts or styles | ||||||||||||||||||||||||||||||||||||||||||||||||
| are permitted. This prevents any injected content from being executed if a | ||||||||||||||||||||||||||||||||||||||||||||||||
| response is ever rendered in a browser context. | ||||||||||||||||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||
| def process_response( | ||||||||||||||||||||||||||||||||||||||||||||||||
| self, request: HttpRequest, response: HttpResponse | ||||||||||||||||||||||||||||||||||||||||||||||||
| ) -> HttpResponse: | ||||||||||||||||||||||||||||||||||||||||||||||||
| response.setdefault( | ||||||||||||||||||||||||||||||||||||||||||||||||
| "Content-Security-Policy", | ||||||||||||||||||||||||||||||||||||||||||||||||
| ( | ||||||||||||||||||||||||||||||||||||||||||||||||
| "default-src 'self'; " | ||||||||||||||||||||||||||||||||||||||||||||||||
| "script-src 'self'; " | ||||||||||||||||||||||||||||||||||||||||||||||||
| "style-src 'self'; " | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+6
to
+22
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. CSP blocks inline scripts required by the login page. The docstring states this is a "JSON API backend" with no inline scripts permitted, but the backend serves HTML pages with inline JavaScript. Context snippet 3 shows The current 🐛 Proposed fix: Add 'unsafe-inline' for script-src and style-src def process_response(
self, request: HttpRequest, response: HttpResponse
) -> HttpResponse:
response.setdefault(
"Content-Security-Policy",
(
"default-src 'self'; "
- "script-src 'self'; "
- "style-src 'self'; "
+ "script-src 'self' 'unsafe-inline'; "
+ "style-src 'self' 'unsafe-inline'; "
"img-src 'self'; "
"font-src 'self'; "
"connect-src 'self'; "
"frame-ancestors 'none'; "
"base-uri 'self'; "
"form-action 'self'"
),
)
return responseAlternatively, refactor the login template to use an external JS file or add a nonce-based CSP for stronger security. Also update the docstring to remove the misleading "JSON API backend" claim. The AI summary states 🧰 Tools🪛 Ruff (0.15.4)[warning] 15-15: Unused method argument: (ARG002) 🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||||||||||||||||||
| "img-src 'self'; " | ||||||||||||||||||||||||||||||||||||||||||||||||
| "font-src 'self'; " | ||||||||||||||||||||||||||||||||||||||||||||||||
| "connect-src 'self'; " | ||||||||||||||||||||||||||||||||||||||||||||||||
| "frame-ancestors 'none'; " | ||||||||||||||||||||||||||||||||||||||||||||||||
| "base-uri 'self'; " | ||||||||||||||||||||||||||||||||||||||||||||||||
| "form-action 'self'" | ||||||||||||||||||||||||||||||||||||||||||||||||
| ), | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+19
to
+29
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Best practice (and the OWASP CSP cheat sheet recommendation) is to explicitly set
Suggested change
Prompt To Fix With AIThis is a comment left during a code review.
Path: backend/middleware/content_security_policy.py
Line: 19-29
Comment:
**`object-src` not explicitly restricted in backend CSP**
`object-src` controls whether `<object>`, `<embed>`, and `<applet>` elements (Flash/Java plugins) can load content. It falls back to `default-src 'self'` here rather than being explicitly denied. The frontend nginx.conf correctly sets `object-src 'none'`, but the backend middleware omits it.
Best practice (and the OWASP CSP cheat sheet recommendation) is to explicitly set `object-src 'none'` on all policies since plugin-based content is almost never intentional in a JSON API context:
```suggestion
(
"default-src 'self'; "
"script-src 'self'; "
"style-src 'self'; "
"img-src 'self'; "
"font-src 'self'; "
"connect-src 'self'; "
"object-src 'none'; "
"frame-ancestors 'none'; "
"base-uri 'self'; "
"form-action 'self'"
),
```
How can I resolve this? If you propose a fix, please make it concise. |
||||||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||||||
| return response | ||||||||||||||||||||||||||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -8,6 +8,7 @@ | |||||||||||||
| from rest_framework import serializers | ||||||||||||||
| from rest_framework.exceptions import ValidationError | ||||||||||||||
| from utils.FileValidator import FileValidator | ||||||||||||||
| from utils.input_sanitizer import validate_name_field, validate_no_html_tags | ||||||||||||||
| from utils.serializer.integrity_error_mixin import IntegrityErrorMixin | ||||||||||||||
|
|
||||||||||||||
| from backend.serializers import AuditSerializer | ||||||||||||||
|
|
@@ -51,6 +52,12 @@ class Meta: | |||||||||||||
| } | ||||||||||||||
| } | ||||||||||||||
|
|
||||||||||||||
| def validate_tool_name(self, value: str) -> str: | ||||||||||||||
| return validate_name_field(value, field_name="Tool name") | ||||||||||||||
|
|
||||||||||||||
| def validate_description(self, value: str) -> str: | ||||||||||||||
| return validate_no_html_tags(value, field_name="Description") | ||||||||||||||
|
Comment on lines
+58
to
+59
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Same issue as
Suggested change
Prompt To Fix With AIThis is a comment left during a code review.
Path: backend/prompt_studio/prompt_studio_core_v2/serializers.py
Line: 58-59
Comment:
**Missing `None` guard — potential `TypeError` on nullable `description`**
Same issue as `workflow_v2/serializers.py:53` — `validate_description` does not guard against `None` before calling `validate_no_html_tags`. If the `description` field accepts `null` values, DRF will invoke this method with `None`, and `HTML_TAG_PATTERN.search(None)` will raise a `TypeError`.
```suggestion
def validate_description(self, value: str) -> str:
if value is None:
return value
return validate_no_html_tags(value, field_name="Description")
```
How can I resolve this? If you propose a fix, please make it concise. |
||||||||||||||
|
|
||||||||||||||
| def validate_summarize_llm_adapter(self, value): | ||||||||||||||
| """Validate that the adapter type is LLM and is accessible to the user.""" | ||||||||||||||
| if value is None: | ||||||||||||||
|
|
||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| import re | ||
|
|
||
| from rest_framework.serializers import ValidationError | ||
|
|
||
| # Pattern to detect HTML/script tags | ||
| HTML_TAG_PATTERN = re.compile(r"<[^>]*>") | ||
| # Pattern to detect javascript: protocol | ||
| JS_PROTOCOL_PATTERN = re.compile(r"javascript\s*:", re.IGNORECASE) | ||
| # Pattern to detect event handlers (onclick, onerror, etc.) | ||
| EVENT_HANDLER_PATTERN = re.compile(r"(?:^|\s)on\w+\s*=", re.IGNORECASE) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The pattern In practice, event handlers need to be inside an HTML tag to be executable (e.g., Consider anchoring the EVENT_HANDLER_PATTERN = re.compile(r"\bon\w+\s*=", re.IGNORECASE)The word boundary Prompt To Fix With AIThis is a comment left during a code review.
Path: backend/utils/input_sanitizer.py
Line: 10
Comment:
**`EVENT_HANDLER_PATTERN` misses event handlers preceded by non-whitespace delimiters**
The pattern `(?:^|\s)on\w+\s*=` only matches `on<handler>=` when it appears at the start of the string or is preceded by whitespace (`\s`). Non-whitespace delimiters like `;`, `&`, or `"` won't match. For example:
```
"foo;onclick=bar" # NOT caught
"foo&onerror=alert(1)" # NOT caught
```
In practice, event handlers need to be inside an HTML tag to be executable (e.g., `<img onerror=...>`), and HTML tags are already blocked by `HTML_TAG_PATTERN`. However, since this pattern is also tested in isolation as a defense-in-depth check, the gap could create a false sense of complete coverage.
Consider anchoring the `on` prefix more broadly:
```python
EVENT_HANDLER_PATTERN = re.compile(r"\bon\w+\s*=", re.IGNORECASE)
```
The word boundary `\b` would still require the `on` to start at a word boundary (after non-word characters like `;`, `&`, space, or start of string), but would not require whitespace specifically.
How can I resolve this? If you propose a fix, please make it concise. |
||
|
|
||
|
|
||
| def validate_no_html_tags(value: str, field_name: str = "This field") -> str: | ||
| """Reject values containing HTML/script tags.""" | ||
| if HTML_TAG_PATTERN.search(value): | ||
| raise ValidationError(f"{field_name} must not contain HTML or script tags.") | ||
| if JS_PROTOCOL_PATTERN.search(value): | ||
| raise ValidationError(f"{field_name} must not contain JavaScript protocols.") | ||
| if EVENT_HANDLER_PATTERN.search(value): | ||
| raise ValidationError(f"{field_name} must not contain event handler attributes.") | ||
| return value | ||
|
|
||
|
|
||
| def validate_name_field(value: str, field_name: str = "This field") -> str: | ||
| """Validate name/identifier fields - no HTML tags, strip whitespace.""" | ||
| value = value.strip() | ||
| if not value: | ||
| raise ValidationError(f"{field_name} must not be empty.") | ||
| return validate_no_html_tags(value, field_name) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,97 @@ | ||
| import pytest | ||
| from rest_framework.serializers import ValidationError | ||
|
|
||
| from utils.input_sanitizer import validate_name_field, validate_no_html_tags | ||
|
|
||
|
|
||
| class TestValidateNoHtmlTags: | ||
| def test_clean_input_passes(self): | ||
| assert validate_no_html_tags("Hello World") == "Hello World" | ||
|
|
||
| def test_allows_normal_special_chars(self): | ||
| assert ( | ||
| validate_no_html_tags("My workflow (v2), test - final") | ||
| == "My workflow (v2), test - final" | ||
| ) | ||
|
|
||
| def test_allows_numbers_and_punctuation(self): | ||
| assert validate_no_html_tags("Test 123 & more!") == "Test 123 & more!" | ||
|
|
||
| def test_rejects_script_tag(self): | ||
| with pytest.raises(ValidationError, match="must not contain HTML or script tags"): | ||
| validate_no_html_tags("<script>alert(1)</script>") | ||
|
|
||
| def test_rejects_img_tag(self): | ||
| with pytest.raises(ValidationError, match="must not contain HTML or script tags"): | ||
| validate_no_html_tags('<img src=x onerror=alert(1)>') | ||
|
|
||
| def test_rejects_div_tag(self): | ||
| with pytest.raises(ValidationError, match="must not contain HTML or script tags"): | ||
| validate_no_html_tags("<div>content</div>") | ||
|
|
||
| def test_rejects_self_closing_tag(self): | ||
| with pytest.raises(ValidationError, match="must not contain HTML or script tags"): | ||
| validate_no_html_tags("<br/>") | ||
|
|
||
| def test_rejects_javascript_protocol(self): | ||
| with pytest.raises(ValidationError, match="must not contain JavaScript protocols"): | ||
| validate_no_html_tags("javascript:alert(1)") | ||
|
|
||
| def test_rejects_javascript_protocol_with_spaces(self): | ||
| with pytest.raises(ValidationError, match="must not contain JavaScript protocols"): | ||
| validate_no_html_tags("javascript :alert(1)") | ||
|
|
||
| def test_rejects_javascript_protocol_case_insensitive(self): | ||
| with pytest.raises(ValidationError, match="must not contain JavaScript protocols"): | ||
| validate_no_html_tags("JAVASCRIPT:alert(1)") | ||
|
|
||
| def test_rejects_event_handler(self): | ||
| with pytest.raises( | ||
| ValidationError, match="must not contain event handler attributes" | ||
| ): | ||
| validate_no_html_tags("onclick=alert(1)") | ||
|
|
||
| def test_rejects_event_handler_with_spaces(self): | ||
| with pytest.raises( | ||
| ValidationError, match="must not contain event handler attributes" | ||
| ): | ||
| validate_no_html_tags("onerror =alert(1)") | ||
|
|
||
| def test_rejects_event_handler_case_insensitive(self): | ||
| with pytest.raises( | ||
| ValidationError, match="must not contain event handler attributes" | ||
| ): | ||
| validate_no_html_tags("ONLOAD=alert(1)") | ||
|
|
||
| def test_custom_field_name_in_error(self): | ||
| with pytest.raises(ValidationError, match="Workflow name"): | ||
| validate_no_html_tags("<script>", field_name="Workflow name") | ||
|
|
||
|
|
||
| class TestValidateNameField: | ||
| def test_clean_name_passes(self): | ||
| assert validate_name_field("My Workflow") == "My Workflow" | ||
|
|
||
| def test_strips_whitespace(self): | ||
| assert validate_name_field(" hello ") == "hello" | ||
|
|
||
| def test_rejects_empty_after_strip(self): | ||
| with pytest.raises(ValidationError, match="must not be empty"): | ||
| validate_name_field(" ") | ||
|
|
||
| def test_rejects_html_tags(self): | ||
| with pytest.raises(ValidationError, match="must not contain HTML or script tags"): | ||
| validate_name_field("<script>alert(1)</script>") | ||
|
|
||
| def test_allows_hyphens_and_underscores(self): | ||
| assert validate_name_field("my-workflow_v2") == "my-workflow_v2" | ||
|
|
||
| def test_allows_periods(self): | ||
| assert validate_name_field("config.v2") == "config.v2" | ||
|
|
||
| def test_allows_parentheses_and_commas(self): | ||
| assert validate_name_field("Test (v2), final") == "Test (v2), final" | ||
|
|
||
| def test_custom_field_name_in_error(self): | ||
| with pytest.raises(ValidationError, match="Tool name"): | ||
| validate_name_field(" ", field_name="Tool name") |
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -14,6 +14,7 @@ | |||||||||||||
| ) | ||||||||||||||
| from tool_instance_v2.serializers import ToolInstanceSerializer | ||||||||||||||
| from tool_instance_v2.tool_instance_helper import ToolInstanceHelper | ||||||||||||||
| from utils.input_sanitizer import validate_name_field, validate_no_html_tags | ||||||||||||||
| from utils.serializer.integrity_error_mixin import IntegrityErrorMixin | ||||||||||||||
|
|
||||||||||||||
| from backend.constants import RequestKey | ||||||||||||||
|
|
@@ -46,6 +47,12 @@ class Meta: | |||||||||||||
| } | ||||||||||||||
| } | ||||||||||||||
|
|
||||||||||||||
| def validate_workflow_name(self, value: str) -> str: | ||||||||||||||
| return validate_name_field(value, field_name="Workflow name") | ||||||||||||||
|
|
||||||||||||||
| def validate_description(self, value: str) -> str: | ||||||||||||||
| return validate_no_html_tags(value, field_name="Description") | ||||||||||||||
|
Comment on lines
+53
to
+54
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
In DRF, when a field has The exact same def validate_description(self, value: str) -> str:
if value is None:
return value
return validate_no_html_tags(value, field_name="Description")This protection is missing in both
Suggested change
Prompt To Fix With AIThis is a comment left during a code review.
Path: backend/workflow_manager/workflow_v2/serializers.py
Line: 53-54
Comment:
**Missing `None` guard — potential `TypeError` on nullable `description`**
In DRF, when a field has `allow_null=True`, `validate_<field>` is still called with `None` after the field returns its value. Calling `validate_no_html_tags(None, ...)` will propagate `None` into `HTML_TAG_PATTERN.search(None)`, which raises `TypeError: expected string or bytes-like object`.
The exact same `description` field is handled safely in `api_v2/serializers.py`:
```python
def validate_description(self, value: str) -> str:
if value is None:
return value
return validate_no_html_tags(value, field_name="Description")
```
This protection is missing in both `WorkflowSerializer.validate_description` (here) and `CustomToolSerializer.validate_description` in `prompt_studio/prompt_studio_core_v2/serializers.py:58`.
```suggestion
def validate_description(self, value: str) -> str:
if value is None:
return value
return validate_no_html_tags(value, field_name="Description")
```
How can I resolve this? If you propose a fix, please make it concise. |
||||||||||||||
|
|
||||||||||||||
| def to_representation(self, instance: Workflow) -> dict[str, str]: | ||||||||||||||
| representation: dict[str, str] = super().to_representation(instance) | ||||||||||||||
| representation[WorkflowKey.WF_NAME] = instance.workflow_name | ||||||||||||||
|
|
||||||||||||||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -47,6 +47,27 @@ http { | |||||
| add_header X-Frame-Options "SAMEORIGIN" always; | ||||||
| add_header Referrer-Policy "strict-origin-when-cross-origin" always; | ||||||
|
|
||||||
| # Content Security Policy | ||||||
| # - 'unsafe-inline' in script-src: required for runtime-config.js injected at container start | ||||||
| # - 'unsafe-inline' in style-src: required by Ant Design CSS-in-JS | ||||||
| # - cdn.jsdelivr.net: Monaco Editor loads from this CDN | ||||||
| # - unpkg.com: PDF.js worker | ||||||
| # - PostHog, GTM, reCAPTCHA, Stripe, Product Fruits: third-party services | ||||||
| add_header Content-Security-Policy | ||||||
| "default-src 'self'; " | ||||||
| "script-src 'self' 'unsafe-inline' https://cdn.jsdelivr.net https://unpkg.com https://eu.i.posthog.com https://eu-assets.i.posthog.com https://www.googletagmanager.com https://www.google.com/recaptcha/ https://www.gstatic.com/recaptcha/ https://js.stripe.com https://app.productfruits.com; " | ||||||
| "style-src 'self' 'unsafe-inline'; " | ||||||
| "img-src 'self' data: blob: https:; " | ||||||
| "font-src 'self' data:; " | ||||||
| "connect-src 'self' ws: wss: https://eu.i.posthog.com https://eu-assets.i.posthog.com https://www.google-analytics.com https://api.stripe.com https://app.productfruits.com; " | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Consider tightening to just
Suggested change
Prompt To Fix With AIThis is a comment left during a code review.
Path: frontend/nginx.conf
Line: 62
Comment:
**`ws:` in `connect-src` allows unencrypted WebSocket to any origin**
`ws:` is a scheme-only wildcard that permits WebSocket connections to **any host** over plaintext. In production, you almost certainly only need secure WebSockets (`wss:`). Leaving `ws:` here means a script running in the browser could open `ws://any-attacker-host/exfil` and it would not be blocked by CSP.
Consider tightening to just `wss:` (or even scoping it to specific origins like `wss://your-api-host`) to remove the plaintext connection escape hatch:
```suggestion
"connect-src 'self' wss: https://eu.i.posthog.com https://eu-assets.i.posthog.com https://www.google-analytics.com https://api.stripe.com https://app.productfruits.com; "
```
How can I resolve this? If you propose a fix, please make it concise. |
||||||
| "frame-src 'self' https://www.google.com/recaptcha/ https://recaptcha.google.com https://js.stripe.com https://hooks.stripe.com; " | ||||||
| "worker-src 'self' blob: https://unpkg.com https://cdn.jsdelivr.net; " | ||||||
| "object-src 'none'; " | ||||||
| "base-uri 'self'; " | ||||||
| "form-action 'self' https://checkout.stripe.com; " | ||||||
| "frame-ancestors 'self'" | ||||||
| always; | ||||||
|
Comment on lines
+56
to
+69
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @hari-kuriakose I hope this covers everything we use. @vishnuszipstack please check. I believe you had some report last time |
||||||
|
|
||||||
| # Disable TRACE and TRACK methods | ||||||
| if ($request_method ~ ^(TRACE|TRACK)$) { | ||||||
| return 405; | ||||||
|
|
||||||
Uh oh!
There was an error while loading. Please reload this page.