Add bosh-monit-access helper for nftables firewall#97
Conversation
Stop sourcing monit-access-helper.sh and calling permit_monit_access when starting the bosh-agent. The agent will manage its own firewall access internally instead of using the cgroup-based helper. This completes the removal of the permit_monit_access functionality now that pxc-release (the only consumer) no longer uses it. Related to cloudfoundry/bosh-agent#399 Related to cloudfoundry/pxc-release#97
Adds a new bosh-monit-access Go binary that allows BOSH jobs to add firewall rules to the new nftables-based monit firewall in bosh-agent. The binary: - Detects if the new firewall exists (--check mode) - Adds cgroup-based rules when possible, with UID fallback - Is idempotent (checks for existing rules before adding) - Includes verbose logging for debugging The galera-agent job is updated to use this helper first, falling back to the old iptables/cgroup approach for backward compatibility with older stemcells. Tested on: - Stemcell 1.1044-custom-agent (new nftables firewall): Uses new helper - Stemcell 1.1016 (old firewall): Falls back to monit-access-helper.sh
...so that air-gapped compilation of the release works
Avoid polluting json logs with non-json output
67e8208 to
4d0a95f
Compare
abg
left a comment
There was a problem hiding this comment.
I added two small changes:
- Forcing the bosh-monit-access output to go to stderr
- Vendoring deps to ensure compilation continues to work in air gapped environments
Beyond this, my concern with this PR is primarily that the utility adds galera-agent to a cgroup but never cleans up as the cgroup is recreated (i.e. monit restart or indirectly via a bosh redeploy or something).
I expect we would see something like this after a monit restart galera-agent that runs the new bosh-monit-access path:
socket cgroupv2 level 2 20309 ip daddr 127.0.0.1 tcp dport 2822 log prefix "bosh-monit-access: cgroup match: " accept
socket cgroupv2 level 2 "system.slice/runc-bpm-galera-agent.scope" ip daddr 127.0.0.1 tcp dport 2822 log prefix "bosh-monit-access: cgroup match: " accept
Where we build a set of "stale" rules over time - until the next vm recreate (stemcell upgrade, etc.) or reboot.
Should we be concerned about security implications of stale nft rules leading to inode reuse?
If we have a good answer to that concern, I am happy to approve.
@colins or @aramprice Do you have any additional opinions here?
901b88e to
3393df4
Compare
Add cleanup logic to remove stale nftables rules before adding new ones. When a BPM container restarts, it gets a new cgroup with a new inode ID, which previously caused rules to accumulate indefinitely. Changes: - Tag each cgroup rule with the job name in nftables userdata comment - Before adding a new rule, delete all existing rules tagged with the same job name - Use tag-based cleanup rather than expression parsing to avoid dependency on nftables library's Socket expression support - Flush rule deletions before checking for existing rules to ensure idempotency works correctly This ensures only one rule per job exists at any time, regardless of how many times the container restarts.
3393df4 to
a4adf76
Compare
|
Manual testing with latest changes: |
|
@rkoster there is some more work to do for this. Specifically for backward compatibility for some MySQL components. |
Summary
Adds a new
bosh-monit-accessGo binary that allows BOSH jobs to add firewall rules to the new nftables-based monit firewall in bosh-agent (cloudfoundry/bosh-agent#399).Changes
New package
bosh-monit-access: A Go binary that:--checkmode returns exit 0/1)Updated
galera-agentjob: Uses the new helper first, falling back to the old iptables/cgroup approach for backward compatibilityHow it works
Testing
Tested on:
monit_access_jobschain--checkreturns 1, falls back tomonit-access-helper.sh--checkexit code