Skip to content

feat(df): add df builtin for disk space usage reporting#205

Open
julesmcrt wants to merge 28 commits intomainfrom
jules.macret/host-remediation/df
Open

feat(df): add df builtin for disk space usage reporting#205
julesmcrt wants to merge 28 commits intomainfrom
jules.macret/host-remediation/df

Conversation

@julesmcrt
Copy link
Copy Markdown
Collaborator

Summary

Adds df as a sandboxed builtin so AI-agent scripts can inspect mounted filesystem usage without invoking the host df binary. v1 supports Linux + macOS; Windows returns "not supported" (mirroring uname).

  • Mount enumeration is delegated to a new internal package, builtins/internal/diskstats, that reads /proc/self/mountinfo on Linux and calls getfsstat(2) on macOS. The /proc read is exempt from AllowedPaths for the same reason ss and ip route are — the path is hardcoded, never derived from user input.
  • The dangerous --sync flag (which would invoke sync(2) and mutate kernel buffer state) is unregistered and rejected by pflag as unknown. GTFOBins has no df entry.

Flag set

Implemented: -h, -H, -k, -P, -T, -i, -a, -l, -t TYPE (repeatable), -x TYPE (repeatable), --total, --no-sync, --help.

Deferred to v2: [FILE]... operands, -B/--block-size, --output[=FIELDS].

Rejected (unknown to pflag): --sync, -v, --version.

Safety bounds (per docs/RULES.md)

  • Mount table capped at 100 000 entries (ErrMaxMounts returned when truncated; both Linux and Darwin honour this for parity).
  • /proc/self/mountinfo line cap of 1 MiB (errLineTooLong).
  • Scan total cap of 1 M lines (CPU-time guard against pathological all-malformed inputs).
  • percentUsed uses paired right-shifts to avoid used*100 overflow at extreme magnitudes.
  • saturatingAdd is used for grand-total accumulation so a rogue mount cannot wrap the running totals.
  • formatCount uses floor + remainder bump to avoid wraparound when grand totals saturate to MaxUint64.
  • ctx.Err() is checked at the top of every per-mount loop on both backends.

Test plan

  • go test ./... (full suite passes locally on darwin/arm64).
  • make fmt clean.
  • go vet ./... cross-platform (GOOS=linux, GOOS=windows) clean.
  • Internal parser tests with synthetic mountinfo: malformed lines, octal escapes (\040, \011, \012, \134), MaxMounts truncation, line-too-long, context cancellation.
  • End-to-end Go tests: every supported flag exercised; rejected flags produce exit 1 + stderr.
  • Live-host invariant tests (unix-only): POSIX rows are numeric, capacity column ends with %, --total row equals saturated sum of per-mount columns, -t filter keeps only matching rows, -x filter removes them.
  • GNU coreutils 9.10 byte-for-byte header fixtures (POSIX, default, -h, -i, -T, --total).
  • Pentest: 20+ subtests covering rejected flags, file-operand traversal, end-of-flags separator, many operands, very long type names, 500 repeated -t flags, weird type values (empty/whitespace/comma/UTF-8/non-UTF-8/Unicode NFD), shell-metacharacter values, attempted config overrides (--proc-path, --mountinfo, --root, --prefix).
  • Fuzz: FuzzParseMountInfo, FuzzUnescapeMountField, FuzzDfFlagCombinator with seed corpora drawn from implementation edge cases, CVE-class inputs (NUL bytes, CRLF, invalid UTF-8, ELF/PE/ZIP magic prefixes), and replays of every existing test input. Added to .github/workflows/fuzz.yml so they run in CI.
  • Help integration: help lists df with the right description; help df shows full usage.

🤖 Generated with Claude Code

@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6b2ec9136f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread builtins/internal/diskstats/diskstats_linux.go Outdated
Comment thread builtins/df/df.go Outdated
Comment thread builtins/df/df.go Outdated
@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b5392e09b8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread builtins/internal/diskstats/diskstats_linux.go Outdated
Comment thread builtins/df/df.go Outdated
Comment thread builtins/df/df.go
Comment thread builtins/df/df.go Outdated
@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1a7d813ef6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread builtins/internal/diskstats/diskstats_linux.go
Comment thread builtins/df/df.go
Comment thread builtins/df/df.go Outdated
@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fc77c89e8a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread builtins/internal/diskstats/diskstats_linux.go
Comment thread AGENTS.md
Comment thread builtins/df/df.go Outdated
@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9d94c7f3d9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread builtins/df/df.go Outdated
Comment thread builtins/df/df.go
Comment thread builtins/df/df.go Outdated
@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

@julesmcrt julesmcrt added the verified/analysis Human-reviewed static analysis changes label Apr 30, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 05b3504ff3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread builtins/internal/diskstats/diskstats_linux.go Outdated
Comment thread builtins/df/df.go Outdated
@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cc63258ac6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread builtins/df/df.go Outdated
Comment thread builtins/df/df.go
@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 755c2fe837

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread builtins/df/df.go Outdated
Comment thread builtins/df/df.go Outdated
@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

1 similar comment
@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5b3d483519

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread builtins/df/df.go
Comment thread builtins/df/df.go Outdated
@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5b3d483519

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread builtins/df/df.go Outdated
@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b5438ea4db

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread builtins/df/df_fuzz_test.go
@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3e8940e9e9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread builtins/df/df.go
@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cbc8201121

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread builtins/df/df.go Outdated
@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 890e69547a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread builtins/df/df.go Outdated
@julesmcrt julesmcrt requested a review from val06 as a code owner May 5, 2026 14:56
julesmcrt and others added 28 commits May 5, 2026 17:47
Adds df as a sandboxed builtin (Linux + macOS; Windows returns "not
supported"). Mount enumeration goes through a new internal package,
builtins/internal/diskstats, which reads /proc/self/mountinfo on Linux
(documented sandbox-bypass mirroring ss/ip route — the path is hardcoded
and never derived from user input) and calls getfsstat(2) on macOS.

Supported flags: -h, -H, -k, -P, -T, -i, -a, -l, -t TYPE (repeatable),
-x TYPE (repeatable), --total, --no-sync, --help. The dangerous --sync
flag (which would invoke sync(2) and mutate kernel buffer state) is
unregistered and rejected by pflag as unknown. -B, --output, and
positional FILE operands are deferred to a future version.

Memory bounds: mount table capped at 100k entries, mountinfo line
length capped at 1 MiB, scan total capped at 1M lines. Integer arithmetic
uses paired right-shifts in percentUsed and saturating addition in totals
to avoid overflow on extreme magnitudes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI fixes:
- Fuzz: dfRunFuzz no longer routes through testutil.RunScriptCtx, which
  fatals on shell-parse errors. The fuzzer routinely mutates inputs into
  malformed shell syntax (unclosed quotes, etc.); we now treat parse
  failures as expected and skip the iteration.
- Windows: pentest tests that exercised df's actual code path
  (TestDfPentestVeryLongTypeName, TestDfPentestManyTypeFilters,
  TestDfPentestTypeFilterEdgeValues, TestDfPentestNonUTF8FlagValue,
  TestDfPentestUnicodeNFD, TestDfPentestQuotedValues) now skip via
  requireSupported — df returns "not supported" on Windows, which made
  the asserted code==0 invariant unreachable.

Codex feedback:
- P1: Remove "overlay" from the Linux pseudo-FS table. Container hosts
  use overlay as the default root filesystem; classifying it as pseudo
  hid the real root from the default listing.
- P2: An explicit -t TYPE filter now overrides the default pseudo-FS
  suppression. `df -t tmpfs` lists tmpfs mounts without requiring -a,
  matching GNU df.
- P2: humanBytes rounds up via math.Ceil instead of fmt.Sprintf's
  round-to-nearest, matching GNU df's "never under-report" rule.
  Example: 1,576,960 bytes is now "1.6M" (was "1.5M").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Windows test failures: TestDfPentestNonUTF8FlagValue, TestDfPentestUnicodeNFD,
and TestDfPentestQuotedValues were missing requireSupported(t) so they ran on
Windows where df returns "not supported" (code 1), making the asserted code==0
unreachable. Add the skip alongside the seven other Windows-skipped tests.

Fuzz contract: the previous version asserted exit codes in {0, 1, 127} only,
which broke on legitimate runner behaviour like "df 0&" → code 2 (background
job). The fuzz target's real contract is "no panic and no hang inside df";
both are enforced by Go's testing framework and the helper's 5-second timeout
respectively. Drop the exit-code assertion entirely. Also stop fataling on
non-ExitStatus runner errors (glob expansion failures, "internal error" on
adversarial inputs) — those are runner behaviour, not df defects. Verified by
running the fuzzer locally for 20s (670k execs) without failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
13 scenario tests under tests/scenarios/cmd/df/ exercised df's actual
code path and asserted exit code 0 with output rows. On Windows, df
returns "not supported" with exit 1. The scenario framework has no
platform-skip mechanism (per AGENTS.md note about ip route), so happy-
path scenarios cannot be added for platform-restricted commands.

The deleted scenarios were redundant with builtins/df/df_unix_test.go,
which uses //go:build unix to exercise the same flag wiring with
structural assertions on the live mount table — a richer check than
the stdout_contains substring match the YAML scenarios provided.

Retained 6 scenarios that are platform-agnostic (they short-circuit
before diskstats.List): --help, extra operand, unknown flag,
--sync rejection, -B/--block-size rejection, --output rejection.

Deleted: basic/default_succeeds.yaml, flags/all.yaml, exclude_type.yaml,
human_readable.yaml, inodes.yaml, k_is_default.yaml, local.yaml,
no_sync.yaml, posix_format.yaml, print_type.yaml, si.yaml, total.yaml,
type_filter.yaml.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four P2 items, all verified against gdf 9.10 locally:

1. tmpfs / devtmpfs are no longer in pseudoTypes. They report real
   storage (/dev/shm, /run) and GNU lists nonzero tmpfs mounts in the
   default output. Default df under-reported them previously.

2. df -t TYPE / -x TYPE now exit 1 with "no file systems processed" on
   stderr when filtering leaves zero rows. GNU df returns 1 in this
   case and scripts test the exit status to detect missing filesystem
   types — silent exit-0 with empty body was a regression.

3. df -ih / df -iH now scales inode counts via humanBytes (e.g. 4.0G,
   381K). Previously inode mode bypassed human formatting and always
   printed raw integers, which broke a common usage even though both
   -i and -h are documented as supported.

4. df -P with -h or -H now uses the "Size" column header instead of
   "1024-blocks". GNU's -P -h emits "Size" — keeping the fixed-block
   label under human-suffixed values would mislead parsers about units.

Updated tests:
- TestDfTypeFilter_NoMatches: now asserts exit 1 + stderr message.
- TestFormatCount: covers the -ih and -iH cases.
- TestBuildHeader: pins -P -h and -P -H to "Size".
- TestDfPentest{VeryLongTypeName, ManyTypeFilters, TypeFilterEdgeValues,
  NonUTF8FlagValue, UnicodeNFD, QuotedValues}: relaxed to accept exit 0
  or 1 (no-match values are common in pentest inputs).
- TestDfPentestTypeIncludeAndExcludeSameType: now asserts the new
  exit-1 + "no file systems processed" contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two follow-on test fixes after the round-2 GNU-compat changes (commit
7b62dfc):

1. TestParseMountInfo_HappyPath: asserted devtmpfs.Pseudo == true, but
   devtmpfs was just removed from pseudoTypes. Flip the assertion to
   match the new (correct) classification, and add an analogous check
   for the /run tmpfs entry to lock in tmpfs.Pseudo == false.

2. TestDfPentestAllFlagsAtOnce: asserted exit 0, but on Linux the -t
   apfs filter matches no rows so df now correctly emits "no file
   systems processed" and exits 1 (the new GNU-compat path). Relax to
   accept either 0 or 1; the pentest's contract is "stacking every
   flag does not crash", not "succeeds with this specific argv on
   every host".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ception

Three Codex round-3 items:

1. P1 — pre-stat filter (real bug). statfs(2) on a stale NFS / CIFS
   mount can hang indefinitely and is not interrupted by ctx
   cancellation. Previously diskstats.List statfs'd every mount in
   /proc/self/mountinfo before df.go could apply -l / -x nfs filters,
   so `df -l` could hang on a dead remote even though the user had
   explicitly asked to skip remote mounts.
   Fix: diskstats.List now takes a FilterFunc parameter. df.go's
   makePreStatFilter encodes -t/-x/-a/-l in a closure that runs
   between mountinfo parsing and the statfs syscall on Linux. Darwin
   already uses MNT_NOWAIT so the filter is cosmetic there.

2. P1 — AllowedPaths and statfs (documentation). statfs returns
   metadata only (block/inode counts, fs type, block size); no file
   content is read. The mount-point paths are kernel-controlled,
   never user-derived. This is the same exception class as ss reading
   all sockets and ip route reading the full routing table — gating
   it on AllowedPaths would produce a misleading partial listing.
   Documented under "Security Design Decisions" in AGENTS.md so the
   trade-off is explicit.

3. P2 — dedup bind-mounts. GNU df hides mounts that share a Source
   with an already-emitted mount unless -a is given. On container
   hosts with overlay bind-mounts of /etc/hosts, /etc/hostname,
   /etc/resolv.conf, default df was printing three duplicate rows
   and --total was double-counting them.
   Fix: filterMounts now keeps a seen-Source set; only the first mount
   for each Source is emitted unless -a is set. Empty Source values
   (rare; some pseudo filesystems) are not collapsed onto each other.

Tests:
- New TestPreStatFilter_* coverage replaces TestFilterMounts_* for the
  type/pseudo/local logic that moved into makePreStatFilter.
- New TestFilterMounts_DedupBySourceWithoutAll, _AllPreservesDuplicates,
  _EmptySourceNotDeduped lock in the dedup behaviour.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… promotion

Three Codex round-4 items, all real:

1. P1 — FUSE remote subtypes (hang protection for sshfs etc.). Linux
   mountinfo reports FUSE mounts as fuse.<subtype>, e.g. fuse.sshfs
   or fuse.smbnetfs. The bare prefix list ("sshfs", "smb", …) did not
   match these strings, so a stale fuse.sshfs mount was still statfs'd
   and could hang despite -l / -x. Added explicit fuse.<remote-backend>
   entries: fuse.sshfs, fuse.smb, fuse.cifs, fuse.davfs, fuse.glusterfs,
   fuse.cephfs, fuse.nfs, fuse.s3, fuse.rclone. TestIsRemoteType extended
   with positive cases for each subtype and negative cases for the
   local FUSE backends (gvfsd-fuse, portal, archivemount).

2. P1 — Document df bypass in README.md. The "Security Model" note
   already covered ss and ip route but not df; operators reading the
   README would miss that AllowedPaths cannot hide mount metadata
   from df. Updated the bullet to list df alongside ss/ip route and
   to spell out that Statfs(2) returns metadata only.

3. P2 — humanBytes promotes after rounding. humanBytes(1048575, 1024)
   was returning "1024K" instead of "1.0M" because the suffix was
   chosen before the ceiling. Refactored: one rounding pass with
   granularity decided by pre-rounded magnitude (one decimal if
   scaled < 10, integer otherwise), then a single promotion step when
   the rounded value reaches base. df -h now matches gdf -h
   byte-for-byte on my host (927G/512G/415G — previously 926G off-by-one).
   Restored the 1<<20-1 → 1.0M test case I had dropped, plus 1<<30-1 →
   1.0G, 1<<40-1 → 1.0T, 10485759 → 10M.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…split -t

Three Codex round-5 items, all real:

1. P2 — pseudo + -l. Mount.Local was defined as "not pseudo and not
   remote", so pseudo mounts (proc, sysfs, cgroup2) had Local=false
   and were silently dropped by `df -al`. GNU df treats pseudo and
   remote as independent classifications: -l drops only remote
   filesystems, and pseudo passes when -a is set. Redefined
   Local := !isRemoteType(...) on both Linux and Darwin so pseudo
   mounts are local (they live in kernel memory, not on a remote
   server). New TestPreStatFilter_AllPlusLocalKeepsPseudo locks this
   in. Verified locally: `df -al` now lists more entries than `df`,
   pseudo mounts re-enabled by -a survive -l.

2. P2 — dedup by device + shortest mount point. The previous
   Source-string dedup was brittle: two unrelated overlay mounts
   sharing a literal source name were collapsed (the kataShared bug
   Codex flagged), and the chosen representative depended on input
   order rather than mount-point length.
   Added a DevID field to Mount (parsed from /proc/self/mountinfo
   field index 2 on Linux, formatted from Statfs_t.Fsid on Darwin)
   and rewrote filterMounts as a two-pass dedup: first pass picks the
   index of the shortest-mountpoint entry per DevID, second pass
   emits in original order. Distinct DevIDs are preserved even when
   Source matches.
   New tests: TestFilterMounts_DedupByDevicePicksShortestMountpoint
   (kataShared scenario verbatim) and
   TestFilterMounts_DistinctDeviceSameSourceNotDeduped.

3. P3 — drop comma-split in -t / -x. Verified empirically:
   `gdf -t apfs,ext4` exits 1 with "no file systems processed" — GNU
   treats the entire string as a single literal type. The previous
   code split on commas and matched either side, which broke scripts
   that used the no-match exit code as a presence test.
   Updated stringSet to store the verbatim argv values; removed
   strings.SplitSeq from the symbol allowlist.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two Codex round-6 items, both real:

1. P1 — saturating block multiply. unix.Statfs_t.Blocks * bsize could
   wrap a uint64 if a buggy/malicious FUSE filesystem reports counts
   above MaxUint64/bsize, producing tiny/zero Total/Used/Free for that
   mount and corrupting the --total accumulation. Added a mulSat(a, b)
   helper to diskstats_unix.go that clamps to MaxUint64 on overflow.
   Linux and Darwin backends now use mulSat for Total, Free, and Used.
   Locked in by TestMulSat covering boundary, just-over-boundary, and
   the realistic FUSE-rogue (maxU * 4096) case.

2. P2 — last-flag-wins for -h / -H. The previous code unconditionally
   preferred -h, so `df -hH` (intended SI override) and shell aliases
   that append -H to a default got the wrong size column. pflag.Visit
   walks set flags in lexicographical order, not argv order, so it
   cannot be used to honor input order. Refactored to a custom
   unitFlag (a pflag.Value) where -h and -H share a single *unitMode
   target and each Set call overwrites it — the LAST one wins by
   construction. registerUnitFlag wraps fs.VarPF + NoOptDefVal="true"
   so the flags accept no argument, matching pflag's bool convention.
   Confirmed locally: `df -hH` prints SI (995G), `df -Hh` prints IEC
   (927G). New TestUnitFlag_LastFlagWins covers 10 argv interleavings
   including combined short flags (-hH, -Hh).

Side effect of #2: dropped the unused `human`/`si` *bool fields from
the flags struct and the resolveUnitMode helper; mode is now read
directly from a *unitMode field.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two Codex round-7 items, both verified empirically against gdf 9.10:

1. P2 — -k participates in the unit-mode last-flag-wins group. Before,
   -k was a no-op bool flag that ran alongside -h / -H without ever
   updating the shared mode, so `df -h -k` left mode = unitsHuman1024
   even though gdf -h -k prints "1K-blocks". Promoted -k into the same
   registerUnitFlag group with unitsK value: all three flags now share
   the *unitMode target and the LAST argv entry wins.
   Locked in: TestUnitFlag_LastFlagWins gains six new cases (-h -k,
   -H -k, -k -h, -k -H, -hk, -kh). Smoke verified: df -h -k now
   prints "1K-blocks", df -k -h prints "Size".

2. P2 — reject overlapping -t / -x. gdf -t apfs -x apfs exits 1 with
   "file system type 'apfs' both selected and excluded". The previous
   code silently let exclusion win. Added an overlappingType helper
   and an early-out check in the handler that emits the GNU-format
   error before any mount listing runs. Updated
   TestDfPentestTypeIncludeAndExcludeSameType to assert the new
   error; kept TestPreStatFilter_TypeExcludeWinsOverIncludeOnPseudo
   to lock in the filter's lower-level exclude-precedence behaviour
   in isolation. Added TestOverlappingType for the helper itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two Codex round-8 items, both verified empirically against gdf 9.10:

1. P2 — GNU compresses "Available" to "Avail" in human modes
   (`gdf -h /` → "Filesystem Size Used Avail Use% Mounted on"). My
   buildHeader unconditionally emitted "Available", so any
   bash-comparison scenario for `df -h` / `df -H` would diverge.
   buildHeader now switches to "Avail" when the unit mode is
   unitsHuman1024 or unitsHuman1000; fixed-block modes (default, -k,
   -P) keep the full "Available" string.

2. P2 — `gdf -iP` keeps "IUse%" as the percentage header, not
   "Capacity". Only the *block* POSIX format substitutes "Capacity"
   for "Use%". My code unconditionally replaced "IUse%" with
   "Capacity" when posix was set in inode mode, which diverged from
   GNU. Removed the conditional; the inode header is now always
   "IUse%" regardless of -P.

Tests: TestBuildHeader expanded to pin both behaviours (Avail in -h /
-H, IUse% preserved with -iP, IUse% not replaced by Capacity). Also
strengthened TestGNUCompatHeaderHuman to explicitly check that "Avail"
is present and "Available" is NOT.

Smoke verified locally:
  df -h  → Filesystem ... Size Used Avail Use% Mounted on
  df -iP → Filesystem ... Inodes IUsed IFree IUse% Mounted on
  df -P  → Filesystem ... 1024-blocks Used Available Capacity Mounted on

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two Codex review-4207024894 items I missed earlier; verified empirically
against gdf 9.10:

1. P2 — Limit POSIX layout to fixed-block output. GNU df only uses the
   strict POSIX single-space row layout when -P is the *sole*
   format-affecting flag. Combinations like -PT, -Pi, -hP, -HP all
   revert to the default aligned column layout, and human modes (-h /
   -H) keep "Use%" instead of "Capacity" even with -P.
   Two narrow fixes:
     * buildHeader: capacity = "Capacity" only when posix && !human.
     * writeOutput: pass posixLayout = posix && !withType && !inodeMode
       && !human to printRows. Single-space rows now apply only to
       `df -P` (with optional -k/-a/-l/-t/-x).
   Smoke verified: df -hP / df -HP → Use% + aligned; df -PT / df -Pi
   → aligned; df -P alone → single-space + Capacity.

2. P3 — Reject the nonstandard --kibibytes flag. GNU only documents
   the short -k; there is no --kibibytes long form. Previously rshell
   accepted both, which let scripts depend on rshell-only behavior.
   Re-registered -k via fs.VarPF with an empty long name so only the
   short form is recognised. df --kibibytes now exits 1 with
   "unknown flag: --kibibytes", matching gdf.
   Added "df --kibibytes" to the rejected-flag pentest list.

Tests: TestBuildHeader strengthened to assert -hP/-HP keep Use% and
do NOT contain Capacity, and that -PT does keep Capacity. Pentest's
TestDfPentestRejectedFlags now includes --kibibytes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The lock file is local Claude session state and shouldn't be tracked.
Removing the accidental check-in from the previous commit and updating
.gitignore to prevent recurrence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two Codex items, both verified empirically:

1. P2 — `df -t TYPE` no longer overrides pseudo suppression.
   gdf 9.4 confirms: `df -t devfs` (devfs is pseudo on macOS) exits 1
   with "no file systems processed". GNU treats -t and the pseudo
   filter as independent — only -a exposes pseudo filesystems. My
   round-2 fix (which made includeSet bypass the pseudo check) was
   over-corrected; -t tmpfs continues to work because tmpfs isn't in
   our pseudoTypes table (RAM-backed but real storage), not because
   -t skips the pseudo filter.
   Removed the `else if !all && m.Pseudo` branch from
   makePreStatFilter so the pseudo check fires regardless of
   includeSet. Replaced TestPreStatFilter_TypeIncludeOverridesPseudoSuppression
   with TestPreStatFilter_TypeIncludeRespectsPseudoSuppression
   covering both the no-`-a` (drops) and `-a -t proc` (lists) cases.

2. P2 — Fuzz runner missed AllowAllCommands. interp.New defaults to
   "no commands allowed" (per interp/api.go), so every fuzz input
   starting with "df" was rejected as "command not allowed" before
   df ever ran. The new CI fuzz job for builtins/df was effectively
   a no-op.
   Added `interpoption.AllowAllCommands().(interp.RunnerOption)` to
   dfRunFuzz's interp.New options (matches testutil.RunScriptCtx).
   Verified: `go test -fuzz=FuzzDfFlagCombinator -fuzztime=5s` now
   executes ~430k iterations vs ~0 effective coverage before.
   Documented the requirement inline so future readers don't strip
   it back out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex P2: pflag.PrintDefaults rendered the shorthand-only -k flag as
"-k, --                           use 1024-byte blocks (POSIX default)"
— treating the empty long name as a literal "--" string. That
advertised "--" as if it were a usable long option, which is wrong.

Fix: mark the -k Flag.Hidden = true so PrintDefaults skips it, and
append a manual "  -k                               use 1024-byte
blocks (POSIX default)" line in printHelp. Flag parsing is unchanged
— -k still overrides earlier -h/-H per the unitFlag last-wins logic,
and --kibibytes is still rejected as unknown.

TestDfHelp now asserts the new format: stdout must contain `-k `,
must NOT contain `-k, --`, and must NOT contain `--kibibytes`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex P2: gdf -P uses GNU's aligned column layout, not the strict
POSIX-spec single-space format. The GNU manual documents -P as
"one-line filesystem rows + POSIX header labels" — spacing stays the
same as the default format, only the column *labels* change
("1024-blocks", "Capacity"). Verified empirically: gdf -P / | od -c
shows ~5 spaces between Filesystem and 1024-blocks.

Removed the single-space branch in printRows entirely. -P now goes
through the same aligned-column path as every other format. The
posixLayout flag and its surrounding logic in writeOutput were dropped;
printRows lost a parameter.

Tests:
- TestDfPosix and TestGNUCompatHeaderPosix now assert header words
  appear in order rather than byte-equality, since column widths now
  adapt to the longest filesystem name.
- TestGNUCompatPosixSingleSpace renamed to TestGNUCompatPosixNoTabs
  and rewritten — the no-tabs invariant is the only spacing claim
  that's actually true of GNU's -P output.
- strings.Join removed from df's symbol allowlist (no longer used
  after the single-space path went away).

Smoke verified: df -P now matches gdf -P byte-for-byte:
  Filesystem     1024-blocks      Used Available Capacity Mounted on

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex P2: when --total accumulates very large filesystems, totalU and
totalA can each saturate to MaxUint64. The previous percentUsed used
saturatingAdd to compute denom, which clamped to MaxUint64 and lost
the relative magnitudes — equal totals like (MaxU, MaxU) reported
"100%" instead of the true "50%".

Two-step scaling:
  1. If used + available would wrap (used > ^uint64(0) - available),
     halve both before summing. The percentage is invariant under
     scaling both sides equally, so at most 1 bit of precision is
     lost — far below the 1% rounding tolerance.
  2. The existing inner loop still shifts used and denom together to
     keep used*100 from overflowing.

TestPercentUsed gains two cases pinning the new behaviour:
  {MaxU, MaxU}      → "50%"
  {MaxU, MaxU/2}    → "67%"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex P2: GNU df treats -h / -H / -k as no-arg flags. `gdf
--human-readable=false` errors with "option '--human-readable'
doesn't allow an argument". Previously rshell silently accepted any
value because unitFlag.Set ignored its argument.

unitFlag.Set now returns an error unless the argument equals the
NoOptDefVal sentinel ("true"). After the fix:
  df --human-readable=false  →  exit 1 ("does not allow an argument")
  df -h=false                →  exit 1
  df --human-readable / -h   →  unchanged (bare flag, success)
  df -hH (combined short)    →  unchanged (success, SI)

Documented limitation: pflag's Set receives the literal string "true"
for both the bare-flag sentinel and explicit `=true`, so the two
cases cannot be distinguished from inside Set. As a result,
`df --human-readable=true` is still silently accepted (GNU rejects
it). Working around this would require argv rescanning. The pentest
suite now covers every `=false` case; a comment notes the `=true`
limitation so future readers don't try to add coverage for it.

Added errors.New to the df symbol allowlist (used by the new
sentinel error).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex P2: GNU's lib/human.c distinguishes SI 'k' (1000) from kibi 'K'
(1024). With -H / --si, GNU df emits "25k" / "1.5k" for sub-mega
values, while my code shared the suffix table and printed "25K" / "1.5K"
in both modes. Only the kilo position differs across modes; M/G/T/P/E
stay uppercase in both.

Fix: humanBytes selects its suffix table by base. base=1000 uses
"kMGTPE"; base=1024 keeps "KMGTPE".

Tests:
- TestHumanBytes_1000 updated: 1000 → "1.0k", 1500 → "1.5k", new
  25_000 → "25k" (Codex's specific scenario), plus 1M/1G/1T cases
  to confirm those stay uppercase.
- TestFormatCount's SI assertion flipped to "1.0k".
- IEC tests unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex P2: GNU df rejects every `--name=value` form for boolean flags
(e.g. `gdf --portability=false` errors with "option '--portability'
doesn't allow an argument"). pflag.BoolP silently accepts the value,
which silently diverged from GNU.

Generalised the explicit-value rejection from unit flags to plain
booleans. New noArgBool (a pflag.Value that wraps *bool) returns an
error from Set unless its argument equals the NoOptDefVal sentinel
"true". registerNoArgBool installs it and returns *bool so callers
keep the same access pattern as fs.Bool.

Replaced every fs.Bool / fs.BoolP in makeFlags:
  --portability / -P
  --print-type / -T
  --inodes / -i
  --all / -a
  --local / -l
  --total
  --no-sync
  --help

After the fix:
  df --portability=false / -P=false  →  exit 1
  df --all=false                     →  exit 1
  df --total=false                   →  exit 1
  ... etc
  df -P / -a / -aTl --total          →  unchanged (success)

Pentest list extended with =false for every boolean flag (long and
short). Same documented `=true` limitation as the unit flags applies
(pflag's Set can't distinguish bare-flag from explicit `=true`
because both pass the literal string "true").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… flags

Previously, flags marked NoArg via NoOptDefVal="true" silently accepted
df --all=true, df --portability=true, df --human-readable=true and similar
forms because pflag could not distinguish the sentinel default from a
user-supplied =true.

Switch the sentinel to a NUL byte (\x00). POSIX execve(2) refuses to pass
NUL inside argv elements, so the sentinel is unforgeable from a real shell
invocation. The Set methods on unitFlag and noArgBool now compare against
this sentinel and return "flag does not allow an argument" for any other
value, including =true and =false.

Pentest list extended with =true and =false variants for every no-argument
flag to lock in the rejection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per Linux statfs(2), f_blocks / f_bfree / f_bavail are counted in
f_frsize units (the fragment size), while f_bsize is only the optimal
transfer block size. They are usually equal, but can differ on
FUSE-backed mounts — multiplying block counts by f_bsize there would
scale the reported Size / Used / Available columns and the --total row
incorrectly by that ratio.

Match GNU coreutils df: prefer st.Frsize when non-zero, fall back to
st.Bsize otherwise.

Darwin's f_bsize is the fundamental block size (no f_frsize on macOS),
so the macOS path is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GNU df does not sort its output by mountpoint — it walks the mount table
as the kernel returns it (mountinfo order on Linux, getfsstat order on
macOS). The previous alphabetical sort meant /dev appeared before /proc
on Linux and /dev before /System/Volumes/* on macOS, breaking row-order
expectations for scripts that diff against /usr/bin/df.

Verified with /opt/homebrew/bin/gdf -a: kernel order is preserved.

Drop the sort.Slice call and remove sort.Slice from both the per-command
and global allowlists (no other builtin uses it).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The no-argument flags (-h/-H/--all/etc.) use a NUL byte as their
NoOptDefVal sentinel so pflag can distinguish bare invocation from
explicit --flag=value (the latter is rejected to match GNU df). But
pflag.PrintDefaults rendered that sentinel verbatim into the help text
as `--all[= ]\x00include pseudo…`, producing binary garbage in the
df --help / help df output stream.

Walk the flag list before PrintDefaults and zero out NoOptDefVal on any
flag whose default is the sentinel. Parse has already run by then, so
this only changes rendered output, not parsing semantics. Add a
regression assert that the help stream contains no NUL bytes and no
"[= ]" residue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GNU coreutils' me_remote (lib/mountlist.c) treats a mount as remote
if EITHER the source carries a network signature OR the type is in
the explicit remote-type list. Previously diskstats only checked the
type, so a remote mount surfaced under a generic type ("auto", "gpfs",
"acfs", or NFS-flavored types not in our prefix list with a
"server:/export" source) was classified as Local.

Under `df -l` the mount would then survive the pre-stat filter and
listImpl would still call statfs(2) on it — defeating the documented
hang protection on stale network mounts.

Replace isRemoteType(fsType) with isRemoteSource(source, fsType) that:
  - flags any source starting with "//"  → SMB / CIFS UNC
  - flags any source containing ":"      → NFS / sshfs host:/export
  - falls back to the existing remote-type prefix list

Add table-driven tests for both classifications, allow strings.Contains
in the diskstats internal allowlist (with documenting comment), and
update the existing TestIsRemoteType to its new TestIsRemoteSource
counterpart.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GNU coreutils df keeps a per-column minimum width (lib/df.c field_data:
SOURCE=14, FSTYPE=4, SIZE/USED/AVAIL=5, USE%=4) so the layout stays
recognisable even when every value is short. On hosts where all
filesystem names fit in fewer than 14 chars (typical containers with
sources like /dev/vda, tmpfs, shm) we were collapsing the source
column to "Filesystem 1K-blocks ..." instead of GNU's padded
"Filesystem      1K-blocks ...", breaking byte-for-byte parity.

Seed the widths array with these GNU minimums before scanning rows.
The only minimums that exceed their header label are SOURCE (14 vs
"Filesystem"=10) and USED (5 vs "Used"=4); the other minimums are
covered by the header string itself but kept here for fidelity.

Verified locally that `df` now matches `/opt/homebrew/bin/gdf` byte-
for-byte at the column-spacing layer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GNU coreutils does not classify squashfs as a dummy/pseudo filesystem
(see lib/mountlist.c ME_DUMMY_0). On a typical Ubuntu desktop the
snap, AppImage, and live-image loop mounts use squashfs and report
real on-disk usage; hiding them by default makes both \`df\` and
\`df -t squashfs\` silently empty, breaking scripts and monitoring
that expect those mounts to surface.

Drop \"squashfs\" from pseudoTypes and document why in the comment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@julesmcrt julesmcrt force-pushed the jules.macret/host-remediation/df branch from db791ad to 18140fa Compare May 5, 2026 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

verified/analysis Human-reviewed static analysis changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant