Feat/nemo rl task5 6 7 rlix progress hooks by TianyeGGBond · Pull Request #5 · rlops/RL

TianyeGGBond · 2026-05-05T02:22:47Z

Wire RLix Task 5/6/7 hooks into NeMo async GRPO

Summary

This PR wires the NeMo RL side of the RLix Task 5/6/7 integration into async GRPO:

adds the RLix hook protocol and no-op default implementation for standalone NeMo RL runs
passes RLix hooks into AsyncTrajectoryCollector
reports rollout progress from the collector for the active training step only
adds catch-up counting for trajectories already buffered before a progress step becomes active
registers the trajectory collector with hooks so RLix can update weight versions after selective sync
gates RLix time-sharing behavior behind RLIX_CONTROL_PLANE=rlix
invokes the Megatron NCCL destroy/reload lifecycle owned by PR feat: Megatron NCCL destroy/reload helpers and tests (Task 3) #4 when running in RLix mode

Notes

Feature 11's NCCL offload implementation is intentionally not duplicated here. This branch expects the API from #4:

nemo_rl.models.megatron.nccl_offload.destroy_megatron_nccl_groups()
nemo_rl.models.megatron.nccl_offload.reload_megatron_nccl_groups(state_snapshot)

Until PR #4 lands or this branch is stacked on it, the RLIX_CONTROL_PLANE=rlix path has that dependency. Standalone NeMo RL behavior uses no-op hooks and should remain unchanged.

Validation

python -m py_compile nemo_rl/algorithms/async_utils.py nemo_rl/algorithms/grpo.py nemo_rl/algorithms/rlix_hooks.py
git diff --check

Could not run the RL pytest suite locally because this environment is missing ray.

Add rlix_hooks.py: RLixHooksProtocol (typing_extensions Protocol) + NoOpRLixHooks default for standalone mode. Seam file keeps NeMo RL free of direct rlix package imports. Modify async_grpo_train: - rlix_hooks parameter injected by NemoRLRLixHooks from pipeline actor - DO_TIME_SHARING flag from RLIX_CONTROL_PLANE env var - before_training(step): blocks on scheduler GPU grant before lp_inference - after_training(step): notifies scheduler release; replaces refit in RLix mode (weight sync + version update done atomically in _expand_workers, F6) - on_trajectory_collector_created: registers collector handle so _expand_workers can call set_weight_version before activating dp rank routing - Initial refit and prepare_for_generation skipped when DO_TIME_SHARING=True TODO placeholders in after_training branch: F4: policy.build_cpu_bucket_cache(step) F11: policy.offload_training_gpu() + policy.destroy_nccl_groups() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

TianyeGGBond and others added 4 commits April 24, 2026 20:35

debug(grpo): add F5/F6 trace prints to verify hook wiring

9f0f23e

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Expose RLix model update hooks for NeMo vLLM

34e19e7

Wire RLix progress hooks into async GRPO

02ec437

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/nemo rl task5 6 7 rlix progress hooks#5

Feat/nemo rl task5 6 7 rlix progress hooks#5
TianyeGGBond wants to merge 4 commits intorlops:mainfrom
TianyeGGBond:feat/nemo-rl-task5-6-7-rlix-progress-hooks

TianyeGGBond commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

TianyeGGBond commented May 5, 2026

Summary

Notes

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant