feat(ai-lakera-guard): scan LLM responses (direction output/both, non-streaming + streaming)#13606
feat(ai-lakera-guard): scan LLM responses (direction output/both, non-streaming + streaming)#13606janiussyafiq wants to merge 2 commits into
Conversation
…t and both directions; update documentation and tests
…ce test coverage for output direction
|
|
||
| -- End of stream: ai-proxy has assembled the full completion text. | ||
| local text = ctx.var.llm_response_text | ||
| if not text or text == "" then |
There was a problem hiding this comment.
This path silently fails open, which is surprising for a guard that defaults to fail-closed everywhere else. The subtlety is that ctx.var.llm_response_text is published only on a usage SSE event (ai-providers/base.lua sets it inside if parsed.usage then), not on a plain [DONE]. ai-proxy injects stream_options.include_usage=true, but lots of OpenAI-compatible / self-hosted providers ignore it and stream content followed by [DONE] with no usage chunk. In that case buffer still holds the real (possibly flagged) completion, but text is nil, so return nil, concat(buffer) releases it to the client unscanned — and with no log line saying scanning was skipped. The non-streaming branch (line 230) has the same shape; lower impact there since nothing is withheld.
Every test passes because the fixtures always carry a usage chunk, so this gap is invisible in CI. Could this honor fail_open (block by default) when there's buffered content but no assembled text, instead of releasing — or at least emit a warn so the skip is observable?
| -- client, so we buffer every chunk (withholding it with an empty body) and | ||
| -- scan the assembled completion once at end-of-stream. This trades | ||
| -- incremental delivery for true blocking. | ||
| if ctx.var.request_type == "ai_stream" then |
There was a problem hiding this comment.
In alert (shadow) mode this still buffers the whole stream and withholds every chunk until end-of-stream, then releases it all at once. The docs frame alert as a non-intrusive, log-only pass-through, but on streaming routes the client loses token-by-token delivery and instead receives the full body in one shot once scanning finishes. So someone who sets direction: output, action: alert precisely to observe Lakera verdicts without affecting traffic does change the observable latency/streaming behavior.
Would it make sense to skip the buffering when action == "alert" (let chunks flow through live and scan a copy at the end), or at least document the streaming caveat for shadow mode?
| end | ||
| buffer[#buffer + 1] = body or "" | ||
|
|
||
| if not ctx.var.llm_request_done then |
There was a problem hiding this comment.
The buffer is only ever released when llm_request_done is observed. The note in the docs covers a dropped connection, but ai-proxy's runaway safeguards (max_stream_duration_ms / max_response_bytes) also set llm_request_done = true and then return without dispatching another body_filter pass — so the buffered content is stranded and never released. That hits clean responses too, not just flagged ones: the client ends up with only the :\n\n heartbeats and no [DONE], even though the response was fine and merely tripped a size/duration cap.
Might be worth flushing (and scanning) whatever is buffered on that abort path, or at least widening the doc caveat beyond "dropped connection" to include the gateway-side safeguards.
| -- client, so we buffer every chunk (withholding it with an empty body) and | ||
| -- scan the assembled completion once at end-of-stream. This trades | ||
| -- incremental delivery for true blocking. | ||
| if ctx.var.request_type == "ai_stream" then |
There was a problem hiding this comment.
Small optimization for shadow mode: when action: alert, this branch still buffers the entire stream and withholds every chunk (the :\n\n heartbeats) until end-of-stream, then releases it all at once — so shadow mode pays the full latency/TTFT cost even though it never blocks anything. action is known up front, so alert mode could skip buffering entirely: let each chunk pass through live (return nothing), and just scan ctx.var.llm_response_text and log once at llm_request_done. That keeps shadow mode zero-impact on the stream, which is rather the point of running it. The withhold-and-buffer path is only needed for action: block.
|
|
||
| -- Streaming: lua_body_filter is invoked once per upstream chunk. We cannot | ||
| -- scan a partial completion and we must not let flagged tokens reach the | ||
| -- client, so we buffer every chunk (withholding it with an empty body) and |
There was a problem hiding this comment.
Two minor things while you're here:
- This comment says the chunk is withheld "with an empty body", but it's actually replaced with a
:\n\nSSE keep-alive (line 251). Worth fixing the wording to match — and maybe noting why a keep-alive rather than"": an empty string trips nginx's "nothing to flush", and returningnilwould let the original chunk leak to the client. local messages = { { role = "assistant", content = text } }followed by themoderate(..., "response", conf.response_failure_message)call is duplicated verbatim between this streaming branch and the non-streamingai_chatbranch above. A smallmoderate_response(ctx, conf, text)local would collapse both into one.
Description
PR-2 of
ai-lakera-guard, following the input-scanning MVP (#13570). Adds response (output) scanning for non-streaming and streaming (SSE) traffic. Back-compatible:directionstill defaults toinput.directionextended toinput/output/both; addsresponse_failure_message.lua_body_filter, the same dispatchai-aliyun-content-moderationuses):ctx.var.llm_response_text; a flagged response is replaced with a provider-compatible deny.[DONE](flagged). Because the stream's200/text/event-streamheaders are already committed when buffering begins, a streamed block is delivered as the deny body —deny_codedoes not apply to streams.Which issue(s) this PR fixes:
Part of #13291.
Checklist