ctx connect sync lock has a TOCTOU race; concurrent syncs can both pass the check

## Summary

The "lock" guarding `ctx connect sync` against concurrent execution is a **check-then-write** pattern, not a real file lock. Two processes that race past the existence check (hook-triggered + manual invocation; cron + interactive run; two terminals) can both decide the lock is free, both write the lock file, both load the same `LastSequence`, both fetch the same entries from the hub, and both write duplicate content into `.context/shared/`.

## Location

[`internal/cli/connection/core/sync/state.go:42-50`](https://github.com/ActiveMemory/ctx/blob/main/internal/cli/connection/core/sync/state.go#L42-L50):

```go
// Acquire lock: fail if another sync is running.
if _, statErr := os.Stat(lockPath); statErr == nil {
    return s, nil, os.ErrExist
}
if writeErr := io.SafeWriteFile(
    lockPath, []byte(cfgHub.LockSentinel), fs.PermFile,
); writeErr != nil {
    return s, nil, writeErr
}
```

The window between the `os.Stat` and the `SafeWriteFile` is unbounded and the pattern is racy by construction. The doc comment at line 21 (`// Acquires a lock file to prevent concurrent access.`) overstates what the code does.

## Reproduction

```bash
# Terminal 1
ctx connect sync &

# Terminal 2 (within milliseconds)
ctx connect sync
```

Both processes return success; `.context/shared/` contains duplicates of every entry returned by the hub since the previous `LastSequence`.

More reliable repro: `for i in $(seq 1 20); do ctx connect sync & done`. The probability of a winning race climbs sharply.

## Proposed Fix

Two reasonable options; either eliminates the race.

**Option A — atomic create-or-fail.** Replace the stat-then-write with a single `O_CREATE|O_EXCL` syscall:

```go
f, err := os.OpenFile(lockPath,
    os.O_CREATE|os.O_EXCL|os.O_WRONLY, 0o644)
if err != nil {
    if errors.Is(err, fs.ErrExist) {
        return s, nil, os.ErrExist
    }
    return s, nil, err
}
if _, writeErr := f.Write([]byte(cfgHub.LockSentinel)); writeErr != nil {
    _ = f.Close()
    _ = os.Remove(lockPath)
    return s, nil, writeErr
}
if closeErr := f.Close(); closeErr != nil {
    _ = os.Remove(lockPath)
    return s, nil, closeErr
}
```

- Pros: portable across all filesystems Go's `os.OpenFile` supports; no extra dependencies; preserves the lockfile-as-sentinel model already in use.
- Cons: a crashed process leaves a stale lock requiring manual or stat-by-age cleanup. Mitigate by writing the PID inside the lock and having loaders test `kill -0 <pid>` to clean up obviously-dead locks.

**Option B — advisory `flock`.** Use `syscall.Flock(fd, LOCK_EX|LOCK_NB)` on the lock file. The kernel releases the lock automatically when the process exits, even on crash.

- Pros: no stale-lock cleanup needed; release-on-exit is automatic.
- Cons: not portable to Windows (`syscall.Flock` is Unix-only; Windows would need `LockFileEx`); requires build tags or a small wrapper package.

**Recommended**: Option A first (matches the existing sentinel pattern, simpler), with a follow-up for stale-lock detection. Option B can replace it later if Windows support becomes a non-issue or a wrapper package is built.

## Tests Required

Closing this should add regression tests so the bug doesn't recur:

- `TestConnectSync_RejectsConcurrentSyncs` — spawn N goroutines all calling `loadState()`; assert exactly one succeeds and N-1 receive `os.ErrExist`.
- `TestConnectSync_ReleasesLockAfterFailure` — assert the lock is removed when sync's body returns an error (so the next attempt can proceed).
- `TestConnectSync_LockFileLocation` — assert the lock lives at `<ctxDir>/hub/<FileSyncLock>` (regression-pin the path).

## Related

- TASKS.md line 393 (open since 2026-04-08, the home entry for this issue).
- `specs/hub-security-audit.md` covers connection-class issues in the hub area.
- Discovered during stale-task triage; see the LEARNINGS.md entry "Closing a stale TASKS.md item often means writing the test, not the code" (2026-05-23).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ctx connect sync lock has a TOCTOU race; concurrent syncs can both pass the check #93

Summary

Location

Reproduction

Proposed Fix

Tests Required

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

ctx connect sync lock has a TOCTOU race; concurrent syncs can both pass the check #93

Description

Summary

Location

Reproduction

Proposed Fix

Tests Required

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions