feat(ai-lakera-guard): scan LLM responses (direction output/both, non-streaming + streaming) by janiussyafiq · Pull Request #13606 · apache/apisix

janiussyafiq · 2026-06-25T03:56:13Z

Description

PR-2 of ai-lakera-guard, following the input-scanning MVP (#13570). Adds response (output) scanning for non-streaming and streaming (SSE) traffic. Back-compatible: direction still defaults to input.

Schema: direction extended to input/output/both; adds response_failure_message.
Response path (lua_body_filter, the same dispatch ai-aliyun-content-moderation uses):
- Non-streaming: scans ctx.var.llm_response_text; a flagged response is replaced with a provider-compatible deny.
- Streaming: buffers the SSE response, scans the assembled completion once, then releases it verbatim (clean) or replaces it with a deny SSE terminated by [DONE] (flagged). Because the stream's 200/text/event-stream headers are already committed when buffering begins, a streamed block is delivered as the deny body — deny_code does not apply to streams.
Docs (en + zh) and tests added.

Which issue(s) this PR fixes:

Part of #13291.

Checklist

I have explained the need for this PR and the problem it solves
I have explained the changes or the new features added to this PR
I have added tests corresponding to this change
I have updated the documentation to reflect this change
I have verified that this change is backward compatible

…t and both directions; update documentation and tests

…ce test coverage for output direction

nic-6443 · 2026-06-25T05:45:16Z

+
+        -- End of stream: ai-proxy has assembled the full completion text.
+        local text = ctx.var.llm_response_text
+        if not text or text == "" then


This path silently fails open, which is surprising for a guard that defaults to fail-closed everywhere else. The subtlety is that ctx.var.llm_response_text is published only on a usage SSE event (ai-providers/base.lua sets it inside if parsed.usage then), not on a plain [DONE]. ai-proxy injects stream_options.include_usage=true, but lots of OpenAI-compatible / self-hosted providers ignore it and stream content followed by [DONE] with no usage chunk. In that case buffer still holds the real (possibly flagged) completion, but text is nil, so return nil, concat(buffer) releases it to the client unscanned — and with no log line saying scanning was skipped. The non-streaming branch (line 230) has the same shape; lower impact there since nothing is withheld.

Every test passes because the fixtures always carry a usage chunk, so this gap is invisible in CI. Could this honor fail_open (block by default) when there's buffered content but no assembled text, instead of releasing — or at least emit a warn so the skip is observable?

nic-6443 · 2026-06-25T05:45:18Z

+    -- client, so we buffer every chunk (withholding it with an empty body) and
+    -- scan the assembled completion once at end-of-stream. This trades
+    -- incremental delivery for true blocking.
+    if ctx.var.request_type == "ai_stream" then


In alert (shadow) mode this still buffers the whole stream and withholds every chunk until end-of-stream, then releases it all at once. The docs frame alert as a non-intrusive, log-only pass-through, but on streaming routes the client loses token-by-token delivery and instead receives the full body in one shot once scanning finishes. So someone who sets direction: output, action: alert precisely to observe Lakera verdicts without affecting traffic does change the observable latency/streaming behavior.

Would it make sense to skip the buffering when action == "alert" (let chunks flow through live and scan a copy at the end), or at least document the streaming caveat for shadow mode?

nic-6443 · 2026-06-25T05:45:21Z

+        end
+        buffer[#buffer + 1] = body or ""
+
+        if not ctx.var.llm_request_done then


The buffer is only ever released when llm_request_done is observed. The note in the docs covers a dropped connection, but ai-proxy's runaway safeguards (max_stream_duration_ms / max_response_bytes) also set llm_request_done = true and then return without dispatching another body_filter pass — so the buffered content is stranded and never released. That hits clean responses too, not just flagged ones: the client ends up with only the :\n\n heartbeats and no [DONE], even though the response was fine and merely tripped a size/duration cap.

Might be worth flushing (and scanning) whatever is buffered on that abort path, or at least widening the doc caveat beyond "dropped connection" to include the gateway-side safeguards.

nic-6443 · 2026-06-25T07:22:51Z

+    -- client, so we buffer every chunk (withholding it with an empty body) and
+    -- scan the assembled completion once at end-of-stream. This trades
+    -- incremental delivery for true blocking.
+    if ctx.var.request_type == "ai_stream" then


Small optimization for shadow mode: when action: alert, this branch still buffers the entire stream and withholds every chunk (the :\n\n heartbeats) until end-of-stream, then releases it all at once — so shadow mode pays the full latency/TTFT cost even though it never blocks anything. action is known up front, so alert mode could skip buffering entirely: let each chunk pass through live (return nothing), and just scan ctx.var.llm_response_text and log once at llm_request_done. That keeps shadow mode zero-impact on the stream, which is rather the point of running it. The withhold-and-buffer path is only needed for action: block.

nic-6443 · 2026-06-25T07:22:53Z

+
+    -- Streaming: lua_body_filter is invoked once per upstream chunk. We cannot
+    -- scan a partial completion and we must not let flagged tokens reach the
+    -- client, so we buffer every chunk (withholding it with an empty body) and


Two minor things while you're here:

This comment says the chunk is withheld "with an empty body", but it's actually replaced with a :\n\n SSE keep-alive (line 251). Worth fixing the wording to match — and maybe noting why a keep-alive rather than "": an empty string trips nginx's "nothing to flush", and returning nil would let the original chunk leak to the client.

local messages = { { role = "assistant", content = text } } followed by the moderate(..., "response", conf.response_failure_message) call is duplicated verbatim between this streaming branch and the non-streaming ai_chat branch above. A small moderate_response(ctx, conf, text) local would collapse both into one.

janiussyafiq added 2 commits June 24, 2026 17:03

feat(ai-lakera-guard): enhance scanning capabilities to support outpu…

4b535f9

…t and both directions; update documentation and tests

feat(ai-lakera-guard): implement multi-chunk streaming mock and enhan…

caf8500

…ce test coverage for output direction

dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Jun 25, 2026

nic-6443 reviewed Jun 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(ai-lakera-guard): scan LLM responses (direction output/both, non-streaming + streaming)#13606

feat(ai-lakera-guard): scan LLM responses (direction output/both, non-streaming + streaming)#13606
janiussyafiq wants to merge 2 commits into
apache:masterfrom
janiussyafiq:feat/ai-lakera-guard-pr2

janiussyafiq commented Jun 25, 2026 •

edited

Loading

Uh oh!

nic-6443 Jun 25, 2026

Uh oh!

nic-6443 Jun 25, 2026

Uh oh!

nic-6443 Jun 25, 2026

Uh oh!

nic-6443 Jun 25, 2026

Uh oh!

nic-6443 Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

janiussyafiq commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Which issue(s) this PR fixes:

Checklist

Uh oh!

nic-6443 Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

nic-6443 Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

nic-6443 Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

nic-6443 Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

nic-6443 Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

janiussyafiq commented Jun 25, 2026 •

edited

Loading