Skip to content

[Bug]: data sync seems to be missing from _AtomicWFile.close(), causing possible data loss on crash #69583

Description

@sandeen

What happened?

atomicfile.py seems to do mostly the right thing with atomic writes via temp files and renaming, but it does not appear to be crash safe. The temp file which is written and renamed is never fsynced, so a crash after the rename may expose the written-but-not-synced new file, leading to data loss (either a partial write, or a truncated file.)

I would expect there to be a file data sync in the close() method, i.e.:

  self._fh.flush()
  os.fsync(self._fh.fileno())

before the rename to ensure crash safety and durability of the file change.

Some filesystems have a built-in heuristic to fsync-on-rename, but this is not required by POSIX and is far from universal. To make this atomic infrastructure crash-safe (you get either the old or the new file, with crash safety) I think the syncing is required.

I understand that this may be too heavy weight for some callers; I am not familiar with salt, and this report is coming from a user who had truncated files specifically in /etc and was requesting a filesystem heuristic workaround. If this write/close infrastructure is used elsewhere where the fsync activity would be too heavy weight and is not required for durability, perhaps it can be on by default (to satisfy durability for file changes in i.e. /etc) and optionally disabled for "atomic writes" that do not require crash safety (atomic old/new file in memory only.)

Thanks for your consideration!

Type of salt install

Official rpm

Major version

3006.x

What supported OS are you seeing the problem on? Can select multiple. (If bug appears on an unsupported OS, please open a GitHub Discussion instead)

rhel-10

salt --versions-report output

I do not have direct access to this as I am passing along a customer concern, but the behavior appears to be present in current git.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugbroken, incorrect, or confusing behavior

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions