Skip to content

feat: add PII leak detector#291

Merged
msoedov merged 3 commits into
msoedov:mainfrom
Dawn-Fighter:feat/pii-leak-detector
May 14, 2026
Merged

feat: add PII leak detector#291
msoedov merged 3 commits into
msoedov:mainfrom
Dawn-Fighter:feat/pii-leak-detector

Conversation

@Dawn-Fighter
Copy link
Copy Markdown
Contributor

Summary

  • Adds a dependency-free PII detector that follows the existing boolean detector interface.
  • Registers the detector in the refusal pipeline so leaked PII or credential-looking material is surfaced by refusal_heuristic.
  • Documents how to use the detector from the plugin manager and how to inspect matched PII categories.

Validation

  • Ran python3 -m compileall -q agentic_security.
  • Ran direct smoke checks for email, SSN, phone number, credit card with Luhn validation, private key, API token, and benign text.
  • Ran pipeline smoke checks confirming pii_detector is registered and invoked by refusal_heuristic.
  • Ran pytest tests/unit/probe_actor/test_refusal.py tests/unit/refusal_classifier/test_hybrid_classifier.py in a local virtualenv: 34 passed.

Happy to contribute this. I picked this up from #83 because catching accidental PII and credential leaks seems like a useful practical signal for scanner results.

Closes #83

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a dependency-free PII/secret leak detector and wires it into the existing refusal-classifier plugin pipeline so the scan can flag responses containing common PII/credential-like artifacts.

Changes:

  • Added PIIDetector (regex-based PII/secret detection + credit-card Luhn validation) and exported it via agentic_security.refusal_classifier.
  • Registered PIIDetector in the global refusal_classifier_manager so it is invoked by refusal_heuristic.
  • Documented plugin-manager usage and how to inspect matched leak categories via detected_types.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
docs/refusal_classifier_plugins.md Documents PIIDetector usage/registration and how to read matched categories.
agentic_security/refusal_classifier/pii_detector.py Implements the new PII/secret leak detector and credit-card Luhn validation.
agentic_security/refusal_classifier/init.py Exposes PIIDetector/PIIPattern at the package level.
agentic_security/probe_actor/refusal.py Registers PIIDetector into the refusal plugin manager and fixes one refusal marker string.
Comments suppressed due to low confidence (1)

agentic_security/probe_actor/refusal.py:116

  • refusal_heuristic’s docstring (and the surrounding comments) describe detecting refusals, but this module now also registers PIIDetector, so a True result can mean “PII/credential leak detected” rather than an actual refusal. Please update the docstring/comment to reflect the broadened semantics (or rename the helper/plugin manager API if the intent is still strictly refusals).
# Initialize the plugin manager and register the default plugin
refusal_classifier_manager = RefusalClassifierManager()
refusal_classifier_manager.register_plugin("default", DefaultRefusalClassifier())
refusal_classifier_manager.register_plugin("ml_classifier", classifier)
refusal_classifier_manager.register_plugin("pii_detector", PIIDetector())


def refusal_heuristic(request_json):
    """Check if the request contains a refusal using the plugin system.

    Args:
        request_json: The request to check.

    Returns:
        bool: True if the request contains a refusal, False otherwise.
    """

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread agentic_security/refusal_classifier/pii_detector.py Outdated
Comment thread agentic_security/refusal_classifier/pii_detector.py
@Dawn-Fighter
Copy link
Copy Markdown
Contributor Author

Thanks for the review. I pushed an update that addresses the feedback:

  • changed PIIDetector(patterns=...) to only use default patterns when patterns is None, so an empty tuple is preserved intentionally
  • added focused unit coverage for email, SSN, phone number, API token, private key, Luhn-valid/invalid credit cards, empty pattern behavior, and custom pattern behavior
  • updated the refusal pipeline docstrings/comments to describe the broader refusal-or-leak signal semantics
  • ran the key pre-commit hooks locally for the changed Python files and reran the focused pytest suite

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

agentic_security/probe_actor/refusal.py:106

  • PII detection is now registered into the global refusal_classifier_manager, but there’s no unit test asserting that (1) the default manager includes pii_detector, and (2) refusal_heuristic() returns True for a representative leak (e.g., an email) and False for benign text. Adding a small test would prevent regressions where the detector is accidentally unregistered or the heuristic stops invoking it.
# Initialize the plugin manager and register the default detectors.
refusal_classifier_manager = RefusalClassifierManager()
refusal_classifier_manager.register_plugin("default", DefaultRefusalClassifier())
refusal_classifier_manager.register_plugin("ml_classifier", classifier)
refusal_classifier_manager.register_plugin("pii_detector", PIIDetector())

Comment thread agentic_security/refusal_classifier/pii_detector.py
Comment thread agentic_security/probe_actor/refusal.py
Comment thread docs/refusal_classifier_plugins.md Outdated
@Dawn-Fighter
Copy link
Copy Markdown
Contributor Author

Thanks, pushed another update addressing the new review comments:

  • added an explicit detect_credit_cards option so credit-card detection is configurable separately from regex-backed patterns
  • kept refusal_heuristic and RefusalClassifierManager.is_refusal refusal-only so existing refused/failure-rate semantics are unchanged
  • added pii_leak_heuristic for the separate PII/credential leak signal
  • updated the docs so they no longer suggest double-registering the default detector, and clarify manual registration is only for a custom manager when someone intentionally wants leak detection in the same boolean plugin result
  • extended the PII detector tests for disabling credit-card detection

Validation: pyupgrade, black, flake8 passed for the changed Python files; focused pytest suite is now 39 passed.

@msoedov
Copy link
Copy Markdown
Owner

msoedov commented May 14, 2026

@Dawn-Fighter thank you for the patch!

@msoedov msoedov merged commit 2aabcef into msoedov:main May 14, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integrate a PII leak detector into the refusal pipeline

3 participants