RANGER-5655: Dynamic unified ingestor registry for audit partition routing and service allowlists#1032
Open
ramackri wants to merge 9 commits into
Open
RANGER-5655: Dynamic unified ingestor registry for audit partition routing and service allowlists#1032ramackri wants to merge 9 commits into
ramackri wants to merge 9 commits into
Conversation
…gestor: runtime Kafka partition routing and per-repo service allowlists via compacted topic + REST, without ingestor restarts. Feature flag default off. Co-authored-by: Cursor <cursoragent@cursor.com>
Use hdfs-only allowlist for dev_hdfs, remove unused dev_solr allowlist entry, fix buffer partition example math, and add detailed manual test documentation for PR apache#1032. Co-authored-by: Cursor <cursoragent@cursor.com>
Keep dev_solr service allowlist property (remove only the stray blank line the feature commit added). Retain hdfs-only dev_hdfs allowlist and buffer partition example fix. Remove dev-support/RANGER-5655-PR-TEMPLATE.md. Co-authored-by: Cursor <cursoragent@cursor.com>
Correct import order, remove unused import, use static requireNonNull, drop duplicate test import, and align PartitionPlan imports with checkstyle rules reported on PR apache#1032. Co-authored-by: Cursor <cursoragent@cursor.com>
…n layout. Ship the standard 14-plugin lab list in ranger-audit-ingestor-site.xml with dynamic partition plan disabled by default; update buffer partition example to 14 × 3 + 9 = 51 total. Co-authored-by: Cursor <cursoragent@cursor.com>
Consolidate partition-plan mutations into three endpoints: GET plan, POST onboard plugin (mandatory non-empty services map), and PATCH update plugin. Remove PATCH /partition-plan and POST /services. Add validator and E2E coverage for mandatory services on onboard. Co-authored-by: Cursor <cursoragent@cursor.com>
Keep REST simplification to Java sources and unit tests only. Co-authored-by: Cursor <cursoragent@cursor.com>
Drop unused PromotePlugin, OnboardService, PluginScale, and PartitionPlanReplacement after REST API consolidation. Cache partition-plan admin users and dynamic.enabled flag in PartitionPlanService constructor.
Refactor partition-plan helpers and AuditREST partition-plan paths to match Ranger review style with one return statement per method.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Implements RANGER-5655: a dynamic unified ingestor registry for Ranger audit-ingestor so operators can change Kafka partition routing and per-repo service allowlists at runtime — without restarting ingestor pods.
The registry is a versioned JSON document in Kafka topic
ranger_audit_partition_plan(1 partition, compacted). All ingestor replicas converge viaPartitionPlanWatcher;AuditPartitionerroutes on the hot path from in-memory state only.Feature flag (default off):
ranger.audit.ingestor.kafka.partition.plan.dynamic.enabled=falseProblem
ranger.audit.ingestor.service.*.allowed.usersin site XML at startupkafka.configured.plugins+ per-plugin overrides at startupSolution (this PR update)
Simplified REST control plane — three endpoints only:
GET/api/audit/partition-planplugins,buffer,services,version)POST/api/audit/partition-plan/pluginsservicesmapPATCH/api/audit/partition-plan/plugins/{pluginId}addServices/updateServices/removeServicesRemoved (consolidated above):
PATCH /api/audit/partition-plan,POST /api/audit/partition-plan/services, separate promote-only / scale-only flows.New request models:
OnboardPlugin,UpdatePlugin. Service entries stored with optionalpluginIdfor repo→plugin ownership.Code changes in this commit (REST simplification slice)
AuditREST.javaPartitionPlanService.javaonboardPlugin(),updatePlugin()PartitionPlanAllocator.javaPartitionPlanRequestValidator.javaserviceson POST onboardOnboardPlugin.java,UpdatePlugin.javaServiceAllowlistEntry.javapluginIdfor ownershipPartitionPlanRequestValidatorTest+ mutation/allocator updates (94 partition-plan tests)How was this patch tested?
Unit tests + quality gates
mvn verify -pl audit-server/audit-ingestor -Drat.skip=true \ -Dtest='PartitionPlan*Test,ServiceAllowlist*Test,AuthToLocalRuleComposerTest'Focused run:
Manual testing (local Docker audit lab)
Manual validation used a local Docker Compose audit environment that mirrors a production-style Ranger audit deployment: Kerberos (KDC + plugin keytabs), Kafka with both the audit data topic (
ranger_audits) and the compacted registry topic (ranger_audit_partition_plan), a running audit-ingestor instance on port 7081, Solr (with audit dispatcher), Postgres-backed Ranger Admin, and real plugin containers for HDFS and Hive. All partition-plan REST calls used SPNEGO (Kerberos) as the ingestor HTTP service principal; plugin audit posts used each plugin’s own keytab.The ingestor was rebuilt with this branch’s code (including mandatory
servicesvalidation on onboard) before running the scenarios below.1. Environment readiness
Before exercising the new API, the lab was brought to a healthy state: ingestor health endpoint returned 200, Kafka was reachable, the plan watcher was active after enabling dynamic mode, and
GET /api/audit/partition-planreturned a coherent plan JSON (version,plugins,buffer,services,topicPartitionCountmatching the liveranger_auditspartition count).2. Static mode unchanged (feature flag off)
With
ranger.audit.ingestor.kafka.partition.plan.dynamic.enabled=false(default):GET /api/audit/partition-planreturned 503 — partition-plan admin API correctly disabled.GET /api/audit/healthstill returned 200.This confirms existing deployments are unaffected when the flag stays off.
3. Enabling dynamic mode and reading the registry
Dynamic mode was turned on (
dynamic.enabled=true) with a fresh or reset plan topic where appropriate. After ingestor restart:ranger_audit_partition_planwith one partition and compacted cleanup policy.GET /api/audit/partition-planreturned 200 with version ≥ 1, populatedservicesfrom XML bootstrap, andtopicPartitionCountequal tokafka-topics --describe ranger_audits.4. Simplified REST API — onboard, validation, scale
All mutations used
expectedVersionfrom the precedingGET.Negative validation (new behavior)
POST /api/audit/partition-plan/pluginswithpluginId,partitionCount, andexpectedVersionbut omittingservices→ 400 Bad Request with message indicating services are required. This was the primary regression guard for the API consolidation.Successful onboard
stormorambari) in a single call with a non-emptyservicesmap (repo →allowedUsers). Response 200; plan version incremented; plugin appeared underpluginswith dedicated partition IDs taken from the buffer (or tail-grown when needed); corresponding repo entries appeared underservices.Multi-repo onboard in one version bump
trinowith two repos in one POST (dev_trinoanddev_trino2, each with its ownallowedUsers) → 200; both repos present inserviceswithpluginIdownership tagged totrino.Optimistic locking
expectedVersion→ 409 Conflict with current plan in the response body.hdfsagain when it already had dedicated partitions → 400 (conflicting state).Scale after onboard
PATCH /api/audit/partition-plan/plugins/{pluginId}withadditionalPartitions→ 200; tail partition IDs appended append-only;ranger_auditsgrown via AdminClient when required; subsequentGETshowed stable version and layout.Idempotency check
GET /api/audit/partition-planwithout restart showed the same version and layout as the last successful write.5. End-to-end plugin flows (allowlist + routing)
These tests prove the full path: registry onboard → allowlist enforcement → Kafka produce → correct partition assignment.
HDFS
hdfswith repodev_hdfsand allowlisthdfs,nnviaPOST .../plugins(mandatoryservices).POST /api/audit/access?serviceName=dev_hdfs&appId=hdfsusing the hdfs Kerberos principal → 200;authenticatedUsermapped to short namehdfs.ranger_auditsand verified the partition number was in thehdfsassignment list from the plan (not the buffer pool).hdfswithPATCH .../plugins/hdfsand repeated the access + Kafka partition check — routing still respected the updated plan.Hive
hiveServer2with repodev_hiveand allowlist["hive"]in the same onboard POST.[7, 8]after prior lab mutations).HDFS already onboarded path
hdfswas already present in the plan from an earlier run, the lab skipped re-onboard and verified allowlist + routing still held: access accepted, partition ∈ plan.6. Allowlist behavior (authorization layer)
Separate from partition routing:
services[repo].allowedUsers(afterauth_to_local) → 200 on/access.PATCH .../plugins/{pluginId}withupdateServicesto remove the principal → 403 on the same POST.This confirms the unified
servicesmap in the registry drives authorization without XML restart.7. What did not change
ranger_auditsgrows.8. Summary of manual test outcomes
services→ 400services