feat(read): implement read builtin with IFS field-splitting#211
feat(read): implement read builtin with IFS field-splitting#211
Conversation
Adds the `read` shell builtin, which consumes a delimited chunk from
stdin and assigns the IFS-split fields to one or more shell variables
(defaulting to REPLY when no name is given). This is the first builtin
in the codebase that mutates parent-shell state, so the change extends
the builtin contract with a minimal pair of capabilities.
Contract change (builtins.CallContext):
- SetVar(name, value) error: assign to a shell variable; routes
through the runner's setVarErr so MaxVarBytes and
MaxTotalVarsBytes are enforced.
- GetVar(name) (string, bool): read a shell variable (used to
consult IFS for field-splitting).
Wired at both CallContext construction sites in interp/runner_exec.go.
Builtin (builtins/read/read.go) supports:
-r raw mode (no backslash interpretation)
-p PROMPT print to stderr (only when stdin is a TTY, matches bash)
-d DELIM single-rune line delimiter; -d '' uses NUL
-n NCHARS read at most N characters (UTF-8 aware), capped at 1 MiB
-N NBYTES read exactly N bytes ignoring delimiters, capped at 1 MiB
-t SECONDS timeout via *os.File.SetReadDeadline (or context.WithTimeout
fallback); returns 142 on timeout to match bash
--help usage to stdout
Excluded from v1: -a (would need SetArrayVar), -s/-u/-e/-i (interactive
features that don't apply to a non-interactive shell). Documented in
SHELL_FEATURES.md.
Field-splitting follows POSIX read semantics: whitespace IFS chars
coalesce, non-whitespace IFS chars introduce separate fields, leading
and trailing IFS-whitespace is stripped, and the last variable absorbs
the remainder when there are more fields than names.
Bytes are read one at a time without buffering so subsequent reads from
the same stdin observe unconsumed bytes (a buffered reader would
swallow them past the delimiter).
Tests: 16 new scenarios under tests/scenarios/cmd/read/ covering
basic, flags, ifs, errors, and help. Three help scenarios bumped from
28 to 29 builtins; the obsolete read-is-blocked scenario removed.
Symbol allowlists updated for the new package's imports.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Builtins must not call time.Now directly; the canonical pattern is callCtx.Now (script-start reference, captured once per Run for stable multi-builtin reasoning) for things like find -mtime / ls -l. For read -t SECONDS we want a true monotonic "from now" deadline, not a script-start reference (a 5s timeout 10s into the script must not fire immediately). callCtx.Now is therefore wrong here. The right answer is to read the deadline already computed by context.WithTimeout — that call is on the symbol allowlist, the stdlib does the time.Now lookup internally, and ctx.Deadline() lets us project that same instant onto *os.File.SetReadDeadline so kernel and context deadlines stay in sync. Removes time.Now from both the global builtin allowlist and the read per-command sublist. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex conduct a comprehensive security and code review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cbf514b698
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Four bash-compatibility fixes flagged by codex:
1. Preserve unsplit input for REPLY and -N
When no NAME is given, bash assigns the raw line to REPLY without
IFS trimming or splitting. With -N, bash assigns the whole read to
the first NAME and leaves remaining NAMEs empty (no IFS splitting).
Both paths now skip splitIFS entirely.
2. Count -N input as characters, not raw bytes
Bash's -N is character-based, not byte-based: read -N 1 over "éx"
stores "é" (2 bytes), not the first byte of the UTF-8 sequence.
Dropped the dedicated byte-mode branch and unified -N with the
default rune+escape path; -N differs only in that it ignores the
delimiter rune. Backslash handling in -N now matches bash too
(escapes processed unless -r).
3. Permit values exactly at the read byte cap
The cap check fired before the -n / delimiter checks, so a value of
exactly MaxReadBytes was rejected on the next iteration even though
the flag validation and MaxVarBytes both allow that size. The cap
now fires only when an append would push past it, so a 1 MiB
value/line followed by EOF or the delimiter returns successfully.
4. Handle a single trailing IFS separator
`printf 'a:b:\n' | { IFS=: read a b; }` should give b="b" because
the trailing colon represents an empty trailing field that bash
silently drops. splitIFS now strips a lone trailing non-ws IFS
character from the last absorbed field when it is the only such
character in that field. Verified across the spectrum: a:b: → b,
a:b:: → b::, a::b: → :b: (only one strip when there is exactly
one trailing IFS char in the absorbed remainder).
New scenarios:
basic/reply_preserves_padding.yaml
flags/N_counts_chars_utf8.yaml
flags/N_applies_escape.yaml
flags/N_skips_field_split.yaml
ifs/trailing_single_ifs_stripped.yaml
All 22 read scenarios pass; full suite green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The bash comparison test runs in debian:bookworm-slim's default C/POSIX locale, where bash's `read -N` counts bytes (so `read -N 1` over `éx` returns the first byte 0xC3 — an invalid UTF-8 fragment). rshell always counts UTF-8 characters regardless of locale, so the test assertion `[é]` matched bash on macOS (UTF-8 locale) but not in CI. Setting LC_ALL=C.UTF-8 for the bash side puts bash in UTF-8 mode, making the comparison meaningful: both shells now treat -N as character-based on UTF-8 input. Verified locally with `LC_ALL=C bash` (returns `[\xc3]`) vs `LC_ALL=C.UTF-8 bash` (returns `[é]`). The intentional divergence in C-locale environments is documented in the scenario comment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex conduct a comprehensive security and code review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 388464302b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if err != nil { | ||
| if errors.Is(err, context.DeadlineExceeded) || errors.Is(err, os.ErrDeadlineExceeded) { | ||
| // Bash convention for read -t timeout: 128 + SIGALRM (14) = 142. | ||
| return builtins.Result{Code: 142} |
There was a problem hiding this comment.
Preserve partial input on read timeouts
When read -t times out after receiving some bytes but before the delimiter, this branch returns before the assignment logic runs, so the partially read value is lost. Bash still assigns the partial line in this scenario (for example, { printf abc; sleep 0.2; printf '\n'; } | { read -t 0.05 x; ...; } leaves x=abc with status 142), so timeout handling should continue through the normal assignment path when line is non-empty.
Useful? React with 👍 / 👎.
| dur := time.Duration(secs * float64(time.Second)) | ||
| var cancel context.CancelFunc | ||
| readCtx, cancel = context.WithTimeout(ctx, dur) |
There was a problem hiding this comment.
Handle zero-second read timeouts as a poll
For read -t 0, Bash help specifies that read returns immediately without reading data and succeeds only if input is available, but context.WithTimeout(ctx, 0) creates an already-expired context so readInput returns DeadlineExceeded unconditionally. This makes even printf x | { read -t 0 x; echo $?; } report 142 instead of the expected successful poll.
Useful? React with 👍 / 👎.
| info, err := f.Stat() | ||
| if err != nil { | ||
| return false | ||
| } | ||
| return info.Mode()&os.ModeCharDevice != 0 |
There was a problem hiding this comment.
Detect actual terminals for read prompts
This treats any character device as a TTY, so redirected stdin from non-terminal devices such as /dev/null will still emit the -p prompt. Bash suppresses prompts for non-interactive redirected input (read -p PROMPT x </dev/null produces no prompt), so this should use an actual terminal check such as isatty/term.IsTerminal rather than os.ModeCharDevice alone.
Useful? React with 👍 / 👎.
What does this PR do?
Adds the
readshell builtin to the restricted shell.readconsumes a delimited chunk from stdin and assigns each IFS-split field to a shell variable (defaulting toREPLYwhen no name is given).This is the first builtin in the codebase that mutates parent-shell state, so the change extends the builtin contract with two minimal capabilities on
builtins.CallContext:SetVar(name, value) error— routes through the runner's existingsetVarErr, soMaxVarBytes(1 MiB) andMaxTotalVarsBytes(1 MiB) are enforced for free.GetVar(name) (string, bool)— needed soreadcan consultIFSfor field-splitting.These two together form the minimal complete contract for any future state-aware builtin (
export,unset,shift, …).Motivation
readis a fundamental POSIX builtin. The canonical idiomwhile IFS= read -r line; do …; done < fileis widely used in shell scripts. Until now the shell had no way to express it;readwas explicitly blocked at dispatch time. Lifting that block required granting builtins a way to mutate parent-shell variables, which was a deliberate architectural decision discussed during implementation rather than a one-line capability add.Surface
Supported flags (six +
--help):-r-p PROMPT-d DELIM-d ''means NUL-n NCHARS-N NBYTES-t SECONDS*os.File.SetReadDeadline;context.WithTimeoutfallback. Returns exit 142 on timeout (bash convention: 128 + SIGALRM)Excluded from v1 with rationale in
SHELL_FEATURES.md:-a(needs a parallelSetArrayVarcontract addition; can follow later),-s/-u/-e/-i(interactive-terminal features that don't apply in a non-interactive shell).Memory & DoS hardening
-nand-Nare rejected at flag-parse time when N > 1 MiB or N < 0. The 1 MiB anchor matches the existingMaxLineBytescap incat/headand the runner'sMaxVarBytes/MaxTotalVarsBytes.readInputenforces a separate runtime byte ceiling (also 1 MiB) so a UTF-8-amplified-ncannot exceed the buffer cap.readcalls on the same stdin observe unconsumed bytes (abufio.Readerwould swallow bytes past the delimiter — common bug in naive implementations).MaxVarByteson the assigned value →MaxTotalVarsByteson the aggregate variable store.IFS field-splitting
Implements POSIX
readsemantics:Testing
16 new YAML scenarios under
tests/scenarios/cmd/read/:basic/(7) — single var, default REPLY, multi-var, last-absorbs, fewer-fields, redirect-from-file, EOF-returns-1.flags/(6) — raw vs non-raw, line continuation, delimiter,-n,-N.ifs/(3) — default IFS (space/tab), custom IFS (:), empty IFS (no stripping).errors/(4) —-nand-Ncap rejection, invalid timeout, invalid identifier.help/(1) —--helpexits 0 with usage.The non-bash-comparison scenarios all pass locally. Bash byte-for-byte comparison (
RSHELL_BASH_TEST=1) was not run locally — Docker daemon was not running. Worth verifying in CI.Three pre-existing
helpscenarios were updated for the bumped builtin count (28 → 29). The obsoleteshell/blocked_commands/.../read.yamlscenario was deleted.Symbol allowlists in
analysis/symbols_builtins.goupdated: 10 new symbols added to the global ceiling (context.CancelFunc,context.DeadlineExceeded,fmt.Fprint,os.ErrDeadlineExceeded,os.File,os.ModeCharDevice,time.Now,unicode/utf8.{DecodeLastRuneInString,FullRune,RuneSelf}) plus a new"read":per-command sublist.Reviewer notes
CallContextis the architecturally significant change. Future state-aware builtins should reuseSetVar/GetVarrather than reaching past them.\<delim>is a line continuation,\<X>is just X). ThereadInputfunction handles this with a single helper closure for rune reads.-pis silent in this codebase because stdin is always a pipe in scripted contexts. The TTY check is there for parity in case stdin is ever wired to a terminal by a calling embedder.verified/allowed_symbolslabel automatically — that is reserved for human approval after reviewing the symbol-allowlist changes.Checklist
SHELL_FEATURES.md)make fmtclean,go vetclean, full test suite passes locally🤖 Generated with Claude Code