Summary
The "lock" guarding ctx connect sync against concurrent execution is a check-then-write pattern, not a real file lock. Two processes that race past the existence check (hook-triggered + manual invocation; cron + interactive run; two terminals) can both decide the lock is free, both write the lock file, both load the same LastSequence, both fetch the same entries from the hub, and both write duplicate content into .context/shared/.
Location
internal/cli/connection/core/sync/state.go:42-50:
// Acquire lock: fail if another sync is running.
if _, statErr := os.Stat(lockPath); statErr == nil {
return s, nil, os.ErrExist
}
if writeErr := io.SafeWriteFile(
lockPath, []byte(cfgHub.LockSentinel), fs.PermFile,
); writeErr != nil {
return s, nil, writeErr
}
The window between the os.Stat and the SafeWriteFile is unbounded and the pattern is racy by construction. The doc comment at line 21 (// Acquires a lock file to prevent concurrent access.) overstates what the code does.
Reproduction
# Terminal 1
ctx connect sync &
# Terminal 2 (within milliseconds)
ctx connect sync
Both processes return success; .context/shared/ contains duplicates of every entry returned by the hub since the previous LastSequence.
More reliable repro: for i in $(seq 1 20); do ctx connect sync & done. The probability of a winning race climbs sharply.
Proposed Fix
Two reasonable options; either eliminates the race.
Option A — atomic create-or-fail. Replace the stat-then-write with a single O_CREATE|O_EXCL syscall:
f, err := os.OpenFile(lockPath,
os.O_CREATE|os.O_EXCL|os.O_WRONLY, 0o644)
if err != nil {
if errors.Is(err, fs.ErrExist) {
return s, nil, os.ErrExist
}
return s, nil, err
}
if _, writeErr := f.Write([]byte(cfgHub.LockSentinel)); writeErr != nil {
_ = f.Close()
_ = os.Remove(lockPath)
return s, nil, writeErr
}
if closeErr := f.Close(); closeErr != nil {
_ = os.Remove(lockPath)
return s, nil, closeErr
}
- Pros: portable across all filesystems Go's
os.OpenFile supports; no extra dependencies; preserves the lockfile-as-sentinel model already in use.
- Cons: a crashed process leaves a stale lock requiring manual or stat-by-age cleanup. Mitigate by writing the PID inside the lock and having loaders test
kill -0 <pid> to clean up obviously-dead locks.
Option B — advisory flock. Use syscall.Flock(fd, LOCK_EX|LOCK_NB) on the lock file. The kernel releases the lock automatically when the process exits, even on crash.
- Pros: no stale-lock cleanup needed; release-on-exit is automatic.
- Cons: not portable to Windows (
syscall.Flock is Unix-only; Windows would need LockFileEx); requires build tags or a small wrapper package.
Recommended: Option A first (matches the existing sentinel pattern, simpler), with a follow-up for stale-lock detection. Option B can replace it later if Windows support becomes a non-issue or a wrapper package is built.
Tests Required
Closing this should add regression tests so the bug doesn't recur:
TestConnectSync_RejectsConcurrentSyncs — spawn N goroutines all calling loadState(); assert exactly one succeeds and N-1 receive os.ErrExist.
TestConnectSync_ReleasesLockAfterFailure — assert the lock is removed when sync's body returns an error (so the next attempt can proceed).
TestConnectSync_LockFileLocation — assert the lock lives at <ctxDir>/hub/<FileSyncLock> (regression-pin the path).
Related
- TASKS.md line 393 (open since 2026-04-08, the home entry for this issue).
specs/hub-security-audit.md covers connection-class issues in the hub area.
- Discovered during stale-task triage; see the LEARNINGS.md entry "Closing a stale TASKS.md item often means writing the test, not the code" (2026-05-23).
Summary
The "lock" guarding
ctx connect syncagainst concurrent execution is a check-then-write pattern, not a real file lock. Two processes that race past the existence check (hook-triggered + manual invocation; cron + interactive run; two terminals) can both decide the lock is free, both write the lock file, both load the sameLastSequence, both fetch the same entries from the hub, and both write duplicate content into.context/shared/.Location
internal/cli/connection/core/sync/state.go:42-50:The window between the
os.Statand theSafeWriteFileis unbounded and the pattern is racy by construction. The doc comment at line 21 (// Acquires a lock file to prevent concurrent access.) overstates what the code does.Reproduction
Both processes return success;
.context/shared/contains duplicates of every entry returned by the hub since the previousLastSequence.More reliable repro:
for i in $(seq 1 20); do ctx connect sync & done. The probability of a winning race climbs sharply.Proposed Fix
Two reasonable options; either eliminates the race.
Option A — atomic create-or-fail. Replace the stat-then-write with a single
O_CREATE|O_EXCLsyscall:os.OpenFilesupports; no extra dependencies; preserves the lockfile-as-sentinel model already in use.kill -0 <pid>to clean up obviously-dead locks.Option B — advisory
flock. Usesyscall.Flock(fd, LOCK_EX|LOCK_NB)on the lock file. The kernel releases the lock automatically when the process exits, even on crash.syscall.Flockis Unix-only; Windows would needLockFileEx); requires build tags or a small wrapper package.Recommended: Option A first (matches the existing sentinel pattern, simpler), with a follow-up for stale-lock detection. Option B can replace it later if Windows support becomes a non-issue or a wrapper package is built.
Tests Required
Closing this should add regression tests so the bug doesn't recur:
TestConnectSync_RejectsConcurrentSyncs— spawn N goroutines all callingloadState(); assert exactly one succeeds and N-1 receiveos.ErrExist.TestConnectSync_ReleasesLockAfterFailure— assert the lock is removed when sync's body returns an error (so the next attempt can proceed).TestConnectSync_LockFileLocation— assert the lock lives at<ctxDir>/hub/<FileSyncLock>(regression-pin the path).Related
specs/hub-security-audit.mdcovers connection-class issues in the hub area.