Skip to content

Add bosh-monit-access helper for nftables firewall#97

Merged
abg merged 4 commits intomainfrom
add-bosh-monit-access-helper
Feb 13, 2026
Merged

Add bosh-monit-access helper for nftables firewall#97
abg merged 4 commits intomainfrom
add-bosh-monit-access-helper

Conversation

@rkoster
Copy link
Contributor

@rkoster rkoster commented Feb 9, 2026

Summary

Adds a new bosh-monit-access Go binary that allows BOSH jobs to add firewall rules to the new nftables-based monit firewall in bosh-agent (cloudfoundry/bosh-agent#399).

Changes

  • New package bosh-monit-access: A Go binary that:

    • Detects if the new firewall exists (--check mode returns exit 0/1)
    • Adds cgroup-based rules when possible, with UID-based fallback
    • Is idempotent (checks for existing rules before adding)
    • Includes verbose logging for debugging
  • Updated galera-agent job: Uses the new helper first, falling back to the old iptables/cgroup approach for backward compatibility

How it works

# Check if new firewall exists
bosh-monit-access --check
# Exit 0 = new firewall present, Exit 1 = not present

# Add firewall rule (tries cgroup first, falls back to UID)
bosh-monit-access
# Output: "Setting up monit firewall rule"
# Output: "Successfully added cgroup-based rule" or "Successfully added UID-based rule"

Testing

Tested on:

  • Stemcell 1.1044-custom-agent (new nftables firewall): Uses new helper, adds rules to monit_access_jobs chain
  • Stemcell 1.1016 (old firewall): --check returns 1, falls back to monit-access-helper.sh
Test New Stemcell Old Stemcell
--check exit code 0 1
Firewall rule added Yes (nftables) No (uses fallback)
galera-agent running Yes Yes
Monit access from container Works Falls back to old mechanism
Rules persist after agent restart Yes N/A

@rkoster rkoster requested review from abg, aramprice and beyhan February 9, 2026 14:41
rkoster added a commit to rkoster/bosh-linux-stemcell-builder that referenced this pull request Feb 9, 2026
Stop sourcing monit-access-helper.sh and calling permit_monit_access
when starting the bosh-agent. The agent will manage its own firewall
access internally instead of using the cgroup-based helper.

This completes the removal of the permit_monit_access functionality
now that pxc-release (the only consumer) no longer uses it.

Related to cloudfoundry/bosh-agent#399
Related to cloudfoundry/pxc-release#97
rkoster and others added 3 commits February 12, 2026 01:35
Adds a new bosh-monit-access Go binary that allows BOSH jobs to add
firewall rules to the new nftables-based monit firewall in bosh-agent.

The binary:
- Detects if the new firewall exists (--check mode)
- Adds cgroup-based rules when possible, with UID fallback
- Is idempotent (checks for existing rules before adding)
- Includes verbose logging for debugging

The galera-agent job is updated to use this helper first, falling back
to the old iptables/cgroup approach for backward compatibility with
older stemcells.

Tested on:
- Stemcell 1.1044-custom-agent (new nftables firewall): Uses new helper
- Stemcell 1.1016 (old firewall): Falls back to monit-access-helper.sh
...so that air-gapped compilation of the release works
Avoid polluting json logs with non-json output
@abg abg force-pushed the add-bosh-monit-access-helper branch from 67e8208 to 4d0a95f Compare February 12, 2026 07:36
Copy link
Member

@abg abg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added two small changes:

  • Forcing the bosh-monit-access output to go to stderr
  • Vendoring deps to ensure compilation continues to work in air gapped environments

Beyond this, my concern with this PR is primarily that the utility adds galera-agent to a cgroup but never cleans up as the cgroup is recreated (i.e. monit restart or indirectly via a bosh redeploy or something).

I expect we would see something like this after a monit restart galera-agent that runs the new bosh-monit-access path:

		socket cgroupv2 level 2 20309 ip daddr 127.0.0.1 tcp dport 2822 log prefix "bosh-monit-access: cgroup match: " accept
		socket cgroupv2 level 2 "system.slice/runc-bpm-galera-agent.scope" ip daddr 127.0.0.1 tcp dport 2822 log prefix "bosh-monit-access: cgroup match: " accept

Where we build a set of "stale" rules over time - until the next vm recreate (stemcell upgrade, etc.) or reboot.

Should we be concerned about security implications of stale nft rules leading to inode reuse?

If we have a good answer to that concern, I am happy to approve.

@colins or @aramprice Do you have any additional opinions here?

@rkoster rkoster requested a review from abg February 12, 2026 08:27
@rkoster rkoster force-pushed the add-bosh-monit-access-helper branch from 901b88e to 3393df4 Compare February 12, 2026 14:45
Add cleanup logic to remove stale nftables rules before adding new ones.
When a BPM container restarts, it gets a new cgroup with a new inode ID,
which previously caused rules to accumulate indefinitely.

Changes:
- Tag each cgroup rule with the job name in nftables userdata comment
- Before adding a new rule, delete all existing rules tagged with the
  same job name
- Use tag-based cleanup rather than expression parsing to avoid
  dependency on nftables library's Socket expression support
- Flush rule deletions before checking for existing rules to ensure
  idempotency works correctly

This ensures only one rule per job exists at any time, regardless of
how many times the container restarts.
@rkoster rkoster force-pushed the add-bosh-monit-access-helper branch from 3393df4 to a4adf76 Compare February 12, 2026 15:16
@rkoster
Copy link
Contributor Author

rkoster commented Feb 12, 2026

Manual testing with latest changes:

mysql/c836d734-46df-4eed-912b-2625760ebdda:~# nft -a list ruleset
table inet bosh_agent { # handle 1
        chain monit_access_jobs { # handle 1
                socket cgroupv2 level 2 "system.slice/runc-bpm-galera-agent.scope" ip daddr 127.0.0.1 tcp dport 2822 log prefix "bosh-monit-access: cgroup match: " accept comment "bosh-monit-access:galera-agent" # handle 29
        }

        chain monit_access { # handle 2
                type filter hook output priority filter - 1; policy accept;
                jump monit_access_jobs # handle 3
                meta skuid 0 ip daddr 127.0.0.1 tcp dport 2822 accept # handle 4
                ip daddr 127.0.0.1 tcp dport 2822 drop # handle 5
        }

        chain nats_access { # handle 6
                type filter hook output priority filter - 1; policy accept;
                meta skuid 0 ip daddr 10.246.0.10 tcp dport 4222 accept # handle 7
                ip daddr 10.246.0.10 tcp dport 4222 drop # handle 8
        }
}
mysql/c836d734-46df-4eed-912b-2625760ebdda:~# monit restart galera-agent
mysql/c836d734-46df-4eed-912b-2625760ebdda:~# monit summary
The Monit daemon 5.2.5 uptime: 3m

Process 'galera-init'               running
Process 'cluster-health-logger'     running
Process 'galera-agent'              initializing
Process 'gra-log-purger'            running
System 'system_dd59edec-3a2f-401a-a7b5-02dfe9be2550' running
mysql/c836d734-46df-4eed-912b-2625760ebdda:~# monit summary
The Monit daemon 5.2.5 uptime: 3m

Process 'galera-init'               running
Process 'cluster-health-logger'     running
Process 'galera-agent'              running
Process 'gra-log-purger'            running
System 'system_dd59edec-3a2f-401a-a7b5-02dfe9be2550' running
mysql/c836d734-46df-4eed-912b-2625760ebdda:~# nft -a list ruleset
table inet bosh_agent { # handle 1
        chain monit_access_jobs { # handle 1
                socket cgroupv2 level 2 "system.slice/runc-bpm-galera-agent.scope" ip daddr 127.0.0.1 tcp dport 2822 log prefix "bosh-monit-access: cgroup match: " accept comment "bosh-monit-access:galera-agent" # handle 30
        }

        chain monit_access { # handle 2
                type filter hook output priority filter - 1; policy accept;
                jump monit_access_jobs # handle 3
                meta skuid 0 ip daddr 127.0.0.1 tcp dport 2822 accept # handle 4
                ip daddr 127.0.0.1 tcp dport 2822 drop # handle 5
        }

        chain nats_access { # handle 6
                type filter hook output priority filter - 1; policy accept;
                meta skuid 0 ip daddr 10.246.0.10 tcp dport 4222 accept # handle 7
                ip daddr 10.246.0.10 tcp dport 4222 drop # handle 8
        }
}
mysql/c836d734-46df-4eed-912b-2625760ebdda:~# /var/vcap/packages/bpm/bin/bpm list
Name                  Pid   Status
bootstrap             -     stopped
cluster-health-logger 31332 running
galera-agent          33042 running
gra-log-purger        31426 running
pxc-mysql.galera-init 31237 running
smoke-tests           -     stopped
mysql/c836d734-46df-4eed-912b-2625760ebdda:~# /var/vcap/packages/bpm/bin/bpm shell galera-agent
root@dd59edec-3a2f-401a-a7b5-02dfe9be2550:/var/vcap/jobs/galera-agent# whoami
root
root@dd59edec-3a2f-401a-a7b5-02dfe9be2550:/var/vcap/jobs/galera-agent# su vcap
To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

vcap@dd59edec-3a2f-401a-a7b5-02dfe9be2550:/var/vcap/jobs/galera-agent$ curl 127.0.0.1:2822
<html><head><title>401 Unauthorized</title></head><body bgcolor=#FFFFFF><h2>Unauthorized</h2>You are <b>not</b> authorized to access <i>monit</i>. Either you supplied the wrong credentials (e.g. bad password), or your browser doesn't under
stand how to supply the credentials required<p><hr><a href='http://mmonit.com/monit/'><font size=-1>monit 5.2.5</font></a></body></html>
vcap@dd59edec-3a2f-401a-a7b5-02dfe9be2550:/var/vcap/jobs/galera-agent$ exit
exit
root@dd59edec-3a2f-401a-a7b5-02dfe9be2550:/var/vcap/jobs/galera-agent# exit
exit
mysql/c836d734-46df-4eed-912b-2625760ebdda:~# /var/vcap/packages/bpm/bin/bpm shell gra-log-purger
vcap@dd59edec-3a2f-401a-a7b5-02dfe9be2550:/var/vcap/jobs/gra-log-purger$ curl 127.0.0.1:2822
^C

Copy link
Member

@abg abg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-project-automation github-project-automation bot moved this from Inbox to Pending Merge | Prioritized in Foundational Infrastructure Working Group Feb 12, 2026
@abg abg merged commit a1cc67b into main Feb 13, 2026
2 checks passed
@github-project-automation github-project-automation bot moved this from Pending Merge | Prioritized to Done in Foundational Infrastructure Working Group Feb 13, 2026
@colins
Copy link
Member

colins commented Feb 13, 2026

@rkoster there is some more work to do for this. Specifically for backward compatibility for some MySQL components.
We'd need some like monit-access-helper.sh in jammy so that we don't break features for customers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

4 participants