Skip to content

propose redacted R stored in KVS as R_redacted#523

Closed
grondo wants to merge 2 commits into
flux-framework:masterfrom
grondo:R_redacted
Closed

propose redacted R stored in KVS as R_redacted#523
grondo wants to merge 2 commits into
flux-framework:masterfrom
grondo:R_redacted

Conversation

@grondo

@grondo grondo commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

This PR updates relevant RFCs to require that an Rv1 object with any scheduling key be stored in the KVS as R_redacted co-located with any full R. This is to solve the issue of a large scheduling key being unnecessarily fetched from the KVS for consumers that only need the execution section. See flux-framework/flux-core#7615

A commit is also included that adds a missing requirement to the resource acquisition protocol that R be sent in the alloc response (de-facto requirement in the code now). The scheduling key is made OPTIONAL in this response in keeping with the spirit of this PR.

grondo and others added 2 commits June 15, 2026 17:10
Problem: The optional `scheduling` key in RFC 20 R objects can be
very large (megabytes of scheduler graph data). Storing and shipping
the full R to every consumer, including one job shell per allocated
rank, wastes overlay network bandwidthl. Most consumers need only
the execution portion of R and never use `scheduling`.

Add a `R_redacted` sibling KVS key alongside the existing `job.<id>.R`
key to hold a copy of R with the optional `scheduling` key omitted.
The complete R (including `scheduling`) is preserved unchanged
in `job.<id>.R` for consumers that require it — the scheduler on
hello/replay and the subinstance resource module — while high-volume
consumers such as job shells may read the smaller `R_redacted`.

Document this convention in RFC 16 (KVS schema), RFC 20 (R format),
RFC 27 (alloc commit requirement), and RFC 28 (acquire protocol
clarification).

Co-authored-by: Claude Sonnet 4 <noreply@anthropic.com>
Problem: The RFC 27 Alloc Success section lists only the OPTIONAL
`annotations` key as an additional field of the SUCCESS response,
but the implementation requires an `R` key carrying the allocated
resource set (RFC 20) and the job manager treats its absence as an
error.

Add `R` as a REQUIRED key of the SUCCESS response and specify that
the scheduler MAY omit the OPTIONAL `scheduling` key from it, since
the job manager needs only the execution portion.

Co-authored-by: Claude Sonnet 4 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

⚠️ linkcheck failed with status code 2

@garlick

garlick commented Jun 16, 2026

Copy link
Copy Markdown
Member

Just brainstorming, but since not all schedulers need this, would it make sense to flip it so R (required) in the KVS is the redacted version and another key like R_scheduling (optional) is the non-redacted version?

Alternatively, and more in keeping with the original design, add a way for the scheduling.writer key to specify indirection to an alternate key containing only the scheduling object. The scheduling frameworks (python scheduler class, libschedutil) could be put in charge of composing/decomposing R for for the scheduler. Then the rest of the system could just use R with ride-along scheduling key as designed.

@grondo

grondo commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

As I commented in flux-framework/flux-core#7615, this won't work because it would break backwards compatibility: the scheduler and resource modules of subinstances fetch the R key from the parent. This change would break launching a previous version of Flux under the new version.

@grondo

grondo commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

Alternatively, and more in keeping with the original design, add a way for the scheduling.writer key to specify indirection to an alternate key containing only the scheduling object. The scheduling frameworks (python scheduler class, libschedutil) could be put in charge of composing/decomposing R for for the scheduler. Then the rest of the system could just use R with ride-along scheduling key as designed.

I like this suggestion if we could figure out a path to avoid breaking inter-version compatibility. Maybe job-info could assemble the full R by default unless a new flag is used? Then RPCs from older versions of Flux would continue to work, and the job shell and other same-version job-info callers could add the flag.

I wonder if it would be simpler to add a new optional scheduling.location or similar key that does the redirect. The scheduling.writer key then stays the same in the separate scheduling object. Either way would be fine with me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants