feat(privacy-filter): add heartbeat-level privacy filtering engine#600
Conversation
Greptile SummaryThis PR adds a configurable regex-based privacy filtering engine (
Confidence Score: 5/5Safe to merge; the engine is a strict no-op on unconfigured instances and all previously identified data-loss and panic defects have been resolved. All critical edge cases — drop without field, redact without replacement or field, invalid regex, non-object intermediate path, missing cache entry fallback — have been addressed in prior iterations with accompanying tests. The only remaining observation is a minor double-compilation of each pattern after a RefreshPrivacyFilter call, which has no impact on correctness or data integrity. No files require special attention. privacy_filter.rs has one minor optimization opportunity but no correctness issues. Important Files Changed
Sequence DiagramsequenceDiagram
participant W as Watcher
participant DS as Datastore (worker)
participant PFE as PrivacyFilterEngine
participant DB as SQLite
W->>DS: Command::Heartbeat(bucket, event, pulsetime)
DS->>PFE: filter_event(bucket, event)
alt event matches a Drop rule
PFE-->>DS: None
DS->>DS: return last_heartbeat[bucket] or event
DS-->>W: Response::Event(last_or_incoming)
else event matches a Redact rule
PFE->>PFE: set_field(event.data, field, replacement)
PFE-->>DS: Some(redacted_event)
DS->>DB: ds.heartbeat(tx, bucket, redacted_event, pulsetime)
DB-->>DS: stored Event
DS-->>W: Response::Event(stored)
else no rule matches
PFE-->>DS: Some(event)
DS->>DB: ds.heartbeat(tx, bucket, event, pulsetime)
DB-->>DS: stored Event
DS-->>W: Response::Event(stored)
end
Note over DS,PFE: RefreshPrivacyFilter reloads rules from settings.privacy_filters KV key
Reviews (7): Last reviewed commit: "fix(privacy-filter): require field on Dr..." | Re-trigger Greptile |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #600 +/- ##
==========================================
+ Coverage 70.81% 76.01% +5.19%
==========================================
Files 51 61 +10
Lines 2916 4802 +1886
==========================================
+ Hits 2065 3650 +1585
- Misses 851 1152 +301 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
… caching, drop fallback - Start with empty PrivacyFilterEngine instead of with_defaults() so the engine is a true no-op when settings.privacy_filters is not configured. The old behavior silently dropped incognito events and redacted banking titles on all installs regardless of user config, contradicting the PR's backward-compat claim. - Pre-compile regex patterns via OnceLock in PrivacyFilterRule. The previous code called regex::Regex::new() on every heartbeat in the hot path. - Fix dropped-heartbeat fallback: use the incoming event as the last-resort value instead of Event::default() (zero timestamp, empty data), which could break the watcher's heartbeat-merge state machine. - Log parse errors on RefreshPrivacyFilter instead of silently swallowing them.
|
This is great, but for proper privacy it would happen at the watcher level. This is still good, could be complementary, but idk. |
|
Addressed the three issues from Greptile's review in a90ffa0: 1. Default rules active without opt-in — Changed 2. Regex recompiled per heartbeat — Added 3. Dropped heartbeat fallback — Replaced Bonus — |
|
Agreed — watcher-level filtering is the "proper" place for privacy (and aw-watcher-window already has it via #99). This is meant as the server-side double-filter you described in #482:
So it's complementary rather than a replacement — the same rule set, applied as a last line of defense at ingest so no watcher can accidentally bypass the user's settings. Happy to close it if you'd prefer to land watcher-side changes first and revisit the server layer later. |
|
The two bugs flagged in Greptile's second review are already addressed in a90ffa0: 1. None => return, // intermediate segment is not an object — skip silentlyA new test 2. Err(_) => {
// Settings key absent — clear rules so deleting the key disables filtering
self.privacy_engine = PrivacyFilterEngine::new(vec![]);
}CI is green across all platforms. Greptile's thread IDs for these two are unresolved in the UI but the underlying issues are patched. |
- set_field: replace .expect() with match+return so dotted-path Redact rules with a non-object intermediate segment skip silently instead of panicking the datastore worker thread - RefreshPrivacyFilter: reset engine to empty on Err (settings key absent) so deleting settings.privacy_filters actually disables filtering - add test_set_field_no_panic_on_non_object_intermediate
|
Correction to my previous comment: the two P1 bugs from Greptile's second review were not actually fixed in a90ffa0 — I was wrong about that. They're now fixed in 8eceb52:
All 6 Greptile threads are now resolved. CI is green. |
…json A Redact rule without a replacement field was silently accepted and would pass events through unchanged with no warning. from_json now returns an error for any Redact rule with replacement: None. Fixes Greptile P1 from third review.
|
@greptileai review |
…in from_json - Redact rules missing `field` now fail from_json with a clear error instead of silently passing events through unchanged - Rules with invalid regex patterns now fail from_json instead of silently compiling to None and never matching anything - Tests: test_from_json_redact_without_field_is_error, test_from_json_invalid_regex_is_error
|
@greptileai review |
|
Fixed in 441f94d: |
…ent accidental total-bucket drops
|
@greptileai review |
|
All review threads are resolved, I attempted to squash-merge the PR with the current head SHA ( |
Implements configurable regex-based privacy filtering at the heartbeat ingestion level, so sensitive data is filtered before it reaches storage.
Changes
aw-datastore/src/privacy_filter.rsPrivacyFilterRule: match onbucket_prefix+ dottedfieldpath +patternPrivacyFilterEngine: container withfilter_event/filter_eventsRefreshPrivacyFiltercommand to reload rules from settings at runtimeCommand::Heartbeatsettings.privacy_filtersis not setDesign
The filter is applied at the server-side heartbeat endpoint so it works for ALL watchers regardless of client-side support. Watchers that implement their own pre-filtering (like aw-watcher-window's existing title exclusion) add defense-in-depth on top.
Closes ActivityWatch/activitywatch#1 (foundational step)
Addresses #482