Harden daemon: socket permissions, env safety, telemetry, rewrite locking#1029
Harden daemon: socket permissions, env safety, telemetry, rewrite locking#1029jwiegley wants to merge 1 commit into
Conversation
bbff7f0 to
eb75b9b
Compare
79c693b to
a0f2607
Compare
eb75b9b to
897edee
Compare
897edee to
b43e12d
Compare
a0f2607 to
9b1e189
Compare
b43e12d to
86f70d0
Compare
9b1e189 to
a301b61
Compare
86f70d0 to
7eaef28
Compare
a301b61 to
7a0c335
Compare
104cb2d to
499b585
Compare
6ba559c to
f19c873
Compare
5b60e7c to
c053d28
Compare
c053d28 to
fad265b
Compare
| fn acquire_rewrite_lock(lock_path: &std::path::Path) -> Option<LockFile> { | ||
| for attempt in 0..3 { | ||
| if let Some(lock) = LockFile::try_acquire(lock_path) { | ||
| return Some(lock); | ||
| } | ||
| if attempt < 2 { | ||
| std::thread::sleep(std::time::Duration::from_millis(100)); | ||
| } | ||
| } | ||
| tracing::warn!("Failed to acquire rewrite log lock after 3 attempts, proceeding without lock"); | ||
| None | ||
| } |
There was a problem hiding this comment.
🟡 Rewrite log lock falls through on contention, defeating its purpose
The new acquire_rewrite_lock function at src/git/rewrite_log.rs:579-590 retries lock acquisition 3 times (~200ms total), then proceeds without the lock if all attempts fail. This defeats the purpose of the lock in exactly the scenario it's designed to protect against: when another process is performing the read-modify-write cycle. Process A holds the lock and is writing; Process B fails to acquire it 3 times, then proceeds lockless — both processes read the same file content, and one process's event is silently lost when the other overwrites it. The lock should either return an Err to the caller or block with a longer timeout rather than silently proceeding unprotected.
Prompt for agents
The function acquire_rewrite_lock in src/git/rewrite_log.rs:579-590 returns None when the lock cannot be acquired after 3 attempts. The caller at line 537 assigns it to _lock and proceeds with the read-modify-write regardless. This silently downgrades to no-lock mode precisely when another writer holds the lock, creating the race condition the lock was meant to prevent.
Two approaches to fix:
1. Return an error instead of None when the lock cannot be acquired, and propagate it to the caller. This is the safest approach but may cause failures in edge cases.
2. Increase the retry count and/or sleep duration significantly (e.g., 10 retries with 200ms sleep = 2 seconds total) so the fallback is only triggered in truly pathological cases, not during normal concurrent access.
The current 3 attempts with 100ms sleep (200ms total budget) is too tight for the lock to be meaningful — a concurrent writer doing file I/O on a slow filesystem could easily hold the lock for longer than 200ms.
Was this helpful? React with 👍 or 👎 to provide feedback.
fad265b to
4fec027
Compare
…king - Set umask(077) before creating control/trace sockets to prevent TOCTOU race with subsequent chmod - Set daemon directory permissions to 0700 - Move env var sanitization before tokio runtime build to avoid unsafe env modification from worker threads - Track dropped telemetry envelopes and CAS records via atomic counters, expose in FamilyStatus - Make watermark update a confirmed operation via oneshot channel - Scope watermark pruning to the correct worktree prefix - Add file locking for rewrite log read-modify-write cycles Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
4fec027 to
c78bc02
Compare

TOCTOU race with subsequent chmod
unsafe env modification from worker threads
counters, expose in FamilyStatus
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com