Skip to content

Fsync atomic_open temp file before rename (#69583)#69584

Open
dwoz wants to merge 1 commit into
saltstack:3006.xfrom
dwoz:dwoz/fix/issue-69583-atomicfile-fsync
Open

Fsync atomic_open temp file before rename (#69583)#69584
dwoz wants to merge 1 commit into
saltstack:3006.xfrom
dwoz:dwoz/fix/issue-69583-atomicfile-fsync

Conversation

@dwoz

@dwoz dwoz commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Make salt.utils.atomicfile.atomic_open actually crash-safe: flush and
fsync() the temp file's data and metadata before the atomic rename
that swaps it into place over the destination.

What issues does this PR fix or reference?

Fixes #69583

Previous Behavior

_AtomicWFile.close() called self._fh.close() and then atomic_rename(...)
with no fsync in between. POSIX does not require the kernel to fsync
implicitly on rename, and the fsync-on-rename heuristic implemented by ext4
and friends is not universal. A crash after the rename could expose a
written-but-unsynced file (truncated or partial) — defeating the durability
contract callers expect from an "atomic write" helper. Users have observed
truncated files in /etc as a consequence.

New Behavior

_AtomicWFile.close() now performs self._fh.flush() and
os.fsync(self._fh.fileno()) before closing the temp file handle, so the
new file's contents are durable on disk before the rename swaps it into
place. The fsync call is guarded against file-like objects that don't
expose a real fd, so behavior is unchanged for callers wrapping in-memory
streams.

Merge requirements satisfied?

  • Docs (no documented-behavior change; durability is implicit in the
    "atomic write" contract)
  • Changelog (changelog/69583.fixed.md)
  • Tests written/updated (tests/pytests/unit/utils/test_atomicfile.py::test_atomicfile_fsyncs_before_rename)

Commits signed with GPG?

Yes

@dwoz

dwoz commented Jun 28, 2026

Copy link
Copy Markdown
Contributor Author

CI red is the inherited 3006.x ZMQ cluster from 36e914a (Use stable ZMQ identity for minion + syndic daemon ret-port REQ sockets). Every failing job here is one of: test_x509_v2::test_sign_remote_certificate{,_compound_match,_certificate_managed_remote} ("ca_server did not respond"), test_state_pillar_errors::test_state_apply_continues_after_pillar_error_is_fixed, test_state_queue*, test_state_state_events, netapi/test_client::test_local*, netapi/rest_tornado/test_{minions_api_handler,root_handler}* (leaked sub-minions), ssh/test_state::test_state_show_top (leaked master_tops_test). Same signatures reproduce on vanilla 3006.x runs on unrelated docs-only PRs — prior triage at #69527 (comment). No atomicfile-induced failure observed; the diff is the 3-line fsync change plus its unit test and changelog.

salt.utils.atomicfile._AtomicWFile.close() renamed the temp file over
the destination without first flushing user-space buffers and fsyncing
the file's data and metadata. POSIX does not require the kernel to
implicitly fsync on rename, and the fsync-on-rename heuristic of ext4
and friends is not universal. As a result a crash after the rename
could expose a written-but-unsynced file (truncated or partial),
defeating the durability contract callers expect from an "atomic
write" helper.

Flush + os.fsync() the underlying file descriptor before close() so
the new file's contents are durable on disk before the rename swaps
it into place.

Fixes saltstack#69583
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:full Run the full test suite

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant