Skip to content

Add UID-based nftables firewall for NATS and monit connections#399

Open
rkoster wants to merge 7 commits intocloudfoundry:mainfrom
rkoster:feature/uid-based-firewall
Open

Add UID-based nftables firewall for NATS and monit connections#399
rkoster wants to merge 7 commits intocloudfoundry:mainfrom
rkoster:feature/uid-based-firewall

Conversation

@rkoster
Copy link
Contributor

@rkoster rkoster commented Feb 6, 2026

Summary

This PR implements a UID-based nftables firewall to protect NATS (mbus) and monit connections, replacing the previous cgroup-based iptables approach.

Motivation

The existing cgroup-based firewall approach has limitations in nested container environments (like Garden containers running on BOSH VMs). In these environments:

  • The cgroup hierarchy may not be accessible or reliable
  • iptables cgroup matching doesn't work consistently
  • This leaves NATS and monit ports exposed to unprivileged processes

The UID-based approach solves this by using nftables with meta skuid matching, which works reliably regardless of container nesting.

Changes

New platform/firewall Package

  • firewall.go - Defines Manager and NatsFirewallHook interfaces
  • nftables_firewall.go - Linux implementation using github.com/google/nftables (netlink-based, no CLI required)
  • nftables_firewall_other.go - Stub for non-Linux platforms

Firewall Rules

Creates an nftables table bosh_agent with two output chains:

Chain Purpose Rule
monit_access Protect monit (port 2822) Allow only UID 0, drop others
nats_access Protect NATS/director connection Allow only UID 0, drop others

Platform Integration

  • Added GetNatsFirewallHook() to Platform interface
  • LinuxPlatform initializes firewall and implements the hook
  • Stub implementations for Windows and dummy platforms

NATS Handler Integration

  • nats_handler.go calls BeforeConnect hook before NATS connection and on each reconnection
  • Supports DNS re-resolution on reconnect for HA director failover scenarios

Testing

  • 23 unit tests for firewall functionality (setup, cleanup, error handling)
  • 4 new tests for NATS handler firewall hook integration
  • All tests use dependency injection for nftables connection and DNS resolver

Technical Details

  • Uses github.com/google/nftables library which communicates via netlink (no nft CLI needed)
  • Works on Ubuntu Jammy and Noble without additional package installation
  • IPv4 and IPv6 support
  • Firewall rules are idempotent (chains are flushed before adding rules)

Testing Performed

  • Unit tests: ginkgo -r platform/firewall mbus
  • Manual testing on Noble stemcell in nested Garden environment:
    • Non-root user: curl 127.0.0.1:2822 hangs (blocked)
    • Root user: curl 127.0.0.1:2822 returns 401 (allowed through firewall)

Implement a firewall mechanism that restricts NATS (mbus) connections
to the bosh-agent process only, using UID-based filtering with nftables.

Key changes:
- Add platform/firewall package with Manager and NatsFirewallHook interfaces
- Implement NftablesFirewall that creates UID-based egress rules
- Add GetNatsFirewallHook() to Platform interface
- Integrate BeforeConnect hook in nats_handler.go for connection/reconnection
- Support DNS re-resolution on reconnect for HA failover scenarios
- Add stub implementations for Windows and dummy platforms

The firewall rules allow only the agent's UID to connect to NATS/director
ports while blocking other processes, improving security posture.
Add comprehensive unit tests for the new firewall functionality:

platform/firewall tests (23 tests):
- SetupMonitFirewall: table/chain/rule creation, error handling
- SetupNATSFirewall: IPv4/IPv6, DNS resolution, https/empty URL handling
- BeforeConnect: delegation to SetupNATSFirewall
- Cleanup: table deletion and error handling

mbus/nats_handler tests (4 new tests):
- Firewall hook is called on Start
- BeforeConnect receives correct mbus URL
- Handler still starts when hook returns nil
- Warning logged but no failure when BeforeConnect errors

Also:
- Add DNSResolver interface for testable DNS resolution
- Inject resolver dependency via NewNftablesFirewallWithDeps
- Configure test logging to use GinkgoWriter for visibility
- Fix ST1023 linter error: omit type from var declaration
- Add linux_header.txt for counterfeiter to add build tags to Linux-only fakes
- Regenerate fake_nftables_conn.go and fake_dnsresolver.go with //go:build linux tag
- This fixes macOS/Windows build failures due to google/nftables being Linux-only
@Alphasite
Copy link

Im a little worried by the general approach of teaching the agent about os-version specific things. Specifically I worry that it will (further?) violate layering by pushing version specific customisation from the stemcell into the agent.

Its probably ok here since this somewhat sits between agent setup and stemcell config, but i wanted to at least mention it even if nothing comes of it.

- Fix nil pointer dereference in DisconnectErrHandler when err is nil
- Remove iptables-based SetupNatsFirewall code (replaced by nftables)
- Remove unused Cleanup() method from firewall interface
- Move firewall initialization from lazy getter to explicit SetupFirewall()
- Add comment explaining IPv6 loopback is intentionally not protected
  (monit only binds to 127.0.0.1:2822)
@rkoster
Copy link
Contributor Author

rkoster commented Feb 6, 2026

Im a little worried by the general approach of teaching the agent about os-version specific things. Specifically I worry that it will (further?) violate layering by pushing version specific customisation from the stemcell into the agent.

There currently is nothing OS specific about this feature, because it works on both noble an jammy. So this is an effort to simplify and centralise all the different nats and monit firewall codepaths into the agent, where it can more easily be tested (compared to the stemcell builder).

The nftables library batches operations until Flush() is called, so
AddTable/AddChain/AddRule never return errors. Removing the misleading
error return types from these internal helper methods.
@rkoster rkoster requested a review from mariash February 6, 2026 20:14
Implement separate chains for agent-managed and job-managed monit access rules:
- monit_access_jobs: Regular chain for job rules (never flushed by agent)
- monit_access: Base chain that jumps to jobs chain first, then applies agent rules

This allows BOSH jobs to add their own monit access rules via pre-start scripts
that persist across agent restarts, while ensuring agent rules are always
up-to-date by flushing and recreating them on each setup call.
Move to firewallfakes/linux_build_constraint.txt to make it clear this file
contains a Go build constraint for counterfeiter-generated fakes, not a C header.
rkoster added a commit to rkoster/bosh-linux-stemcell-builder that referenced this pull request Feb 9, 2026
Remove the cgroup v1 net_cls-based monit API access control mechanism
including the monit wrapper script, helper functions, and iptables rules.

The monit binary now runs directly without a wrapper. Access control
will be managed by the bosh-agent's internal firewall implementation.

Related to cloudfoundry/bosh-agent#399
rkoster added a commit to rkoster/bosh-linux-stemcell-builder that referenced this pull request Feb 9, 2026
Stop sourcing monit-access-helper.sh and calling permit_monit_access
when starting the bosh-agent. The agent will manage its own firewall
access internally instead of using the cgroup-based helper.

This completes the removal of the permit_monit_access functionality
now that pxc-release (the only consumer) no longer uses it.

Related to cloudfoundry/bosh-agent#399
Related to cloudfoundry/pxc-release#97
rkoster added a commit to rkoster/bosh-linux-stemcell-builder that referenced this pull request Feb 9, 2026
Remove the static nftables-based monit API access control mechanism.
The monit service now runs without firewall restrictions. Access control
will be managed by the bosh-agent's internal firewall implementation.

Related to cloudfoundry/bosh-agent#399
mariash
mariash previously approved these changes Feb 9, 2026
@github-project-automation github-project-automation bot moved this from Pending Review | Discussion to Pending Merge | Prioritized in Foundational Infrastructure Working Group Feb 9, 2026
@rkoster rkoster force-pushed the feature/uid-based-firewall branch from 6e235a0 to a0f7661 Compare February 10, 2026 16:14
@rkoster rkoster requested review from aramprice and mariash February 12, 2026 15:43
@rkoster
Copy link
Contributor Author

rkoster commented Feb 12, 2026

This PR should not be merged before: cloudfoundry/pxc-release#97 has been released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Pending Merge | Prioritized

Development

Successfully merging this pull request may close these issues.

4 participants