Skip to content

feat(ai-lakera-guard): scan LLM responses (direction output/both, non-streaming + streaming)#13606

Open
janiussyafiq wants to merge 2 commits into
apache:masterfrom
janiussyafiq:feat/ai-lakera-guard-pr2
Open

feat(ai-lakera-guard): scan LLM responses (direction output/both, non-streaming + streaming)#13606
janiussyafiq wants to merge 2 commits into
apache:masterfrom
janiussyafiq:feat/ai-lakera-guard-pr2

Conversation

@janiussyafiq

@janiussyafiq janiussyafiq commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Description

PR-2 of ai-lakera-guard, following the input-scanning MVP (#13570). Adds response (output) scanning for non-streaming and streaming (SSE) traffic. Back-compatible: direction still defaults to input.

  • Schema: direction extended to input/output/both; adds response_failure_message.
  • Response path (lua_body_filter, the same dispatch ai-aliyun-content-moderation uses):
    • Non-streaming: scans ctx.var.llm_response_text; a flagged response is replaced with a provider-compatible deny.
    • Streaming: buffers the SSE response, scans the assembled completion once, then releases it verbatim (clean) or replaces it with a deny SSE terminated by [DONE] (flagged). Because the stream's 200/text/event-stream headers are already committed when buffering begins, a streamed block is delivered as the deny bodydeny_code does not apply to streams.
  • Docs (en + zh) and tests added.

Which issue(s) this PR fixes:

Part of #13291.

Checklist

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible

@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Jun 25, 2026

-- End of stream: ai-proxy has assembled the full completion text.
local text = ctx.var.llm_response_text
if not text or text == "" then

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path silently fails open, which is surprising for a guard that defaults to fail-closed everywhere else. The subtlety is that ctx.var.llm_response_text is published only on a usage SSE event (ai-providers/base.lua sets it inside if parsed.usage then), not on a plain [DONE]. ai-proxy injects stream_options.include_usage=true, but lots of OpenAI-compatible / self-hosted providers ignore it and stream content followed by [DONE] with no usage chunk. In that case buffer still holds the real (possibly flagged) completion, but text is nil, so return nil, concat(buffer) releases it to the client unscanned — and with no log line saying scanning was skipped. The non-streaming branch (line 230) has the same shape; lower impact there since nothing is withheld.

Every test passes because the fixtures always carry a usage chunk, so this gap is invisible in CI. Could this honor fail_open (block by default) when there's buffered content but no assembled text, instead of releasing — or at least emit a warn so the skip is observable?

-- client, so we buffer every chunk (withholding it with an empty body) and
-- scan the assembled completion once at end-of-stream. This trades
-- incremental delivery for true blocking.
if ctx.var.request_type == "ai_stream" then

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In alert (shadow) mode this still buffers the whole stream and withholds every chunk until end-of-stream, then releases it all at once. The docs frame alert as a non-intrusive, log-only pass-through, but on streaming routes the client loses token-by-token delivery and instead receives the full body in one shot once scanning finishes. So someone who sets direction: output, action: alert precisely to observe Lakera verdicts without affecting traffic does change the observable latency/streaming behavior.

Would it make sense to skip the buffering when action == "alert" (let chunks flow through live and scan a copy at the end), or at least document the streaming caveat for shadow mode?

end
buffer[#buffer + 1] = body or ""

if not ctx.var.llm_request_done then

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The buffer is only ever released when llm_request_done is observed. The note in the docs covers a dropped connection, but ai-proxy's runaway safeguards (max_stream_duration_ms / max_response_bytes) also set llm_request_done = true and then return without dispatching another body_filter pass — so the buffered content is stranded and never released. That hits clean responses too, not just flagged ones: the client ends up with only the :\n\n heartbeats and no [DONE], even though the response was fine and merely tripped a size/duration cap.

Might be worth flushing (and scanning) whatever is buffered on that abort path, or at least widening the doc caveat beyond "dropped connection" to include the gateway-side safeguards.

-- client, so we buffer every chunk (withholding it with an empty body) and
-- scan the assembled completion once at end-of-stream. This trades
-- incremental delivery for true blocking.
if ctx.var.request_type == "ai_stream" then

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small optimization for shadow mode: when action: alert, this branch still buffers the entire stream and withholds every chunk (the :\n\n heartbeats) until end-of-stream, then releases it all at once — so shadow mode pays the full latency/TTFT cost even though it never blocks anything. action is known up front, so alert mode could skip buffering entirely: let each chunk pass through live (return nothing), and just scan ctx.var.llm_response_text and log once at llm_request_done. That keeps shadow mode zero-impact on the stream, which is rather the point of running it. The withhold-and-buffer path is only needed for action: block.


-- Streaming: lua_body_filter is invoked once per upstream chunk. We cannot
-- scan a partial completion and we must not let flagged tokens reach the
-- client, so we buffer every chunk (withholding it with an empty body) and

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor things while you're here:

  • This comment says the chunk is withheld "with an empty body", but it's actually replaced with a :\n\n SSE keep-alive (line 251). Worth fixing the wording to match — and maybe noting why a keep-alive rather than "": an empty string trips nginx's "nothing to flush", and returning nil would let the original chunk leak to the client.
  • local messages = { { role = "assistant", content = text } } followed by the moderate(..., "response", conf.response_failure_message) call is duplicated verbatim between this streaming branch and the non-streaming ai_chat branch above. A small moderate_response(ctx, conf, text) local would collapse both into one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants