Skip to content

docs(rfc): add RFC 1977 multi-player support design#1980

Open
derekwaynecarr wants to merge 1 commit into
NVIDIA:mainfrom
derekwaynecarr:decarr/multi-player-design
Open

docs(rfc): add RFC 1977 multi-player support design#1980
derekwaynecarr wants to merge 1 commit into
NVIDIA:mainfrom
derekwaynecarr:decarr/multi-player-design

Conversation

@derekwaynecarr

Copy link
Copy Markdown
Collaborator

Summary

This RFC introduces multi-player support for OpenShell by adding namespaces as hard isolation boundaries, expanding the role model to five roles (Platform Admin, Namespace Admin, Operator, User, Service
Account), and threading ownership through the sandbox lifecycle. The Kubernetes compute driver gains two namespace mapping modes — managed (default), which creates gateway-scoped Kubernetes namespaces
(openshell-{gateway-id}-{namespace}), and operator mode for 1:1 passthrough to pre-existing namespaces. The design preserves backwards compatibility for single-player support via a default namespace.

Related Issue

#1977

Changes

Namespaces as first-class hard isolation boundaries for sandboxes, providers, and policies, with a default namespace for backwards compatibility

  • Expanded role model from two-tier (admin/user) to five roles: Platform Admin, Namespace Admin, Operator, User, Service Account
  • Ownership tracking via created_by on ObjectMeta, with owner-scoped access guards on all sandbox operations
  • Kubernetes namespace mapping with two modes: managed (default, creates openshell-{gateway-id}-{namespace-name}) and operator (1:1 name passthrough to pre-existing K8s namespaces)
  • Multi-gateway cluster support via gateway-identifier-scoped Kubernetes namespace naming to avoid collisions
  • Provider credential scoping to namespaces, with delegation from Namespace Admins to users/service accounts
  • Policy inheritance where Namespace Admins can tighten (but not loosen) gateway-wide defaults
  • Multi-provider OIDC with identity federation, plus API key authentication for service accounts
  • Control-plane audit trail via OCSF ApiActivity events on every mutating gRPC call, with session attribution back to the creating principal
  • Per-namespace quotas for concurrent sandboxes, GPU allocations, and sandbox lifetime
  • Cost attribution metadata tagging sandbox consumption with owner, namespace, and labels
  • Sandbox sharing within namespaces (read-only or exec access) without global visibility

Testing

  • [x ] mise run pre-commit passes
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

Checklist

  • [ x] Follows Conventional Commits
  • [ x] Commits are signed off (DCO)
  • [ x] Architecture docs updated (if applicable)

@derekwaynecarr derekwaynecarr requested review from a team, maxamillion and mrunalp as code owners June 23, 2026 13:37
@copy-pr-bot

copy-pr-bot Bot commented Jun 23, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

- **Phase 1: Namespace and ownership model.** Add `namespace` and `created_by`
fields to `ObjectMeta` in the proto. Implement namespace-scoped storage and
filtering in gRPC handlers. Create the `default` namespace for backwards
compatibility. Sandbox name uniqueness shifts from globally unique to

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it critical to implement this in a backward compatible way right now?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we didn't create a default namespace what would be the single-player UX?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the spirit of the default namespace is that a user never thinks about namespaces at all when using openshell in a single player setup, so the default or some other token is just there to make sure there is no friction in the single player experience by adding multiplayer support.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also (automatically) create a namespace per user account in single player mode. This sets us up to have a single gateway for the workstation while supporting different user accounts on that workstation.


### Kubernetes Compute Driver: Namespace Mapping

OpenShell namespaces are a logical concept. When the Kubernetes compute driver

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be useful to outline what behaviors/patterns we hope to enable and control via namespaces. I found myself inventing reasons that it'd be useful to have namespaces, but I think a list of practical applications would be helpful.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the simplest example i have is the friction we hit when doing a team-level gateway setup. within a team, its common for users to have their own dedicated API keys to access claude or codex, and these are private to the individual. that friction leads folks towards wanting a gateway per trust/security domain when a common gateway with some credential segmentation would satisfy. this proposal enables that concept. it also would be safe to now share a sandbox (for connect/exec) actions among users in shared coding sessions, etc. since the literal credentials are left outside the sandbox.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@derekwaynecarr in that scenario, as described Provider Credential Scoping section a namespace admin would have to create and lifecycle manage ever user's credentials in their namespace or would each user have their own namespace and therefore be a namespace admin?


- What is the identity mapping strategy for multi-provider OIDC? If a user
authenticates via both corporate SSO and GitHub, how are those identities
linked to a single internal principal?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this even be a thing? Intuitively, this feels like 2 principals to me, or at least I'd not be surprised if it were treated that way. Grant me the same permissions twice and or share with myself (my two principals) feels acceptable.

authenticates via both corporate SSO and GitHub, how are those identities
linked to a single internal principal?

- Should per-namespace quota limits be hard (reject sandbox creation) or soft

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a reasonable configuration option (to be implemented at any time)

also be namespace-scoped from the start, or should they remain global and be
extended later as the organizational model matures?

- In operator mode, should the driver validate that the target Kubernetes

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fail seems right here. Even if you check, it could go away by the time you try to make it.

|------|-------------|
| **Platform Admin** | Manages gateway configuration, auth providers, compute drivers, and quotas. Full visibility across all namespaces. |
| **Namespace Admin** | Manages users, providers, policies, and quotas within a single namespace. Cannot change gateway infra or access other namespaces. |
| **Operator** | Read-only view of all sandboxes and audit logs across namespaces for monitoring, incident response, and compliance. Cannot create or modify sandboxes. |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still like the term Auditor for this. Maybe operator means the same in other (kube?) communities?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like auditor as well.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 auditor

currently does not have a durable store beyond configuration files.

- Which resources beyond sandboxes are namespace-scoped? Sandboxes are the
primary namespaced resource. Should settings, policies, and provider configs

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earlier in the doc you said this:

Providers belong to a namespace.

Should probably ask the agent to make sure the entire document agrees with itself.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha, good catch! will update.

Signed-off-by: Derek Carr <decarr@redhat.com>
@derekwaynecarr derekwaynecarr force-pushed the decarr/multi-player-design branch from 3713b9b to 85e9054 Compare June 23, 2026 14:16
@drew

drew commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Could we rename this to RFC-0011? I'll reserve the number in our tracker 😄.

- Gives a clear security boundary (namespace) without over-modeling
organizational hierarchy.
- Allows multiple overlapping groupings within a namespace via labels.
- Reuses Kubernetes-style patterns that users already understand.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not necessarily against this, but I think assuming users of AI Agents already understand kubernetes design patterns might be a stretch.

unique-within-namespace. Existing sandboxes are backfilled into the `default`
namespace. All existing single-player behavior continues unchanged.

- **Phase 2: Kubernetes driver — managed mode (default).** The driver creates

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about building out a podman driver version of this too. The scenario I have in mind is where an user/student/homelabber who wants to tinker with their team or learn on their own with minimal setup and admin overhead could spin up a linux VM or cloud instance, install the openshell-gateway rpm, run the openshell gateway service, create an openshell namespace for themselves or each member of their team. This obviously won't scale and is a single point of failure, but could be an interesting means to test it out and provides an easy/simple path to "my agents aren't running on my laptop".

Basically one local linux user openshell, one rootless Podman socket, one gateway process, many OpenShell namespaces, one Podman network per OpenShell namespace, one workspace volume per sandbox, no arbitrary bind mounts, namespace-scoped volumes/providers, gateway-enforced RBAC/quotas, OIDC/API-key auth required, gateway enforced quotas, OCSF attribution, and a strict shared-host driver mode that the user would have to opt into.

Or maybe that's a fools errand and the answer is just to show people how to do this with kind, minikube, k3s, or microshift. Thoughts? 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

5 participants