Skip to content

Feat/nemo rl task5 6 7 rlix progress hooks#5

Open
TianyeGGBond wants to merge 4 commits intorlops:mainfrom
TianyeGGBond:feat/nemo-rl-task5-6-7-rlix-progress-hooks
Open

Feat/nemo rl task5 6 7 rlix progress hooks#5
TianyeGGBond wants to merge 4 commits intorlops:mainfrom
TianyeGGBond:feat/nemo-rl-task5-6-7-rlix-progress-hooks

Conversation

@TianyeGGBond
Copy link
Copy Markdown
Collaborator

Wire RLix Task 5/6/7 hooks into NeMo async GRPO

Summary

This PR wires the NeMo RL side of the RLix Task 5/6/7 integration into async GRPO:

  • adds the RLix hook protocol and no-op default implementation for standalone NeMo RL runs
  • passes RLix hooks into AsyncTrajectoryCollector
  • reports rollout progress from the collector for the active training step only
  • adds catch-up counting for trajectories already buffered before a progress step becomes active
  • registers the trajectory collector with hooks so RLix can update weight versions after selective sync
  • gates RLix time-sharing behavior behind RLIX_CONTROL_PLANE=rlix
  • invokes the Megatron NCCL destroy/reload lifecycle owned by PR feat: Megatron NCCL destroy/reload helpers and tests (Task 3) #4 when running in RLix mode

Notes

Feature 11's NCCL offload implementation is intentionally not duplicated here. This branch expects the API from #4:

  • nemo_rl.models.megatron.nccl_offload.destroy_megatron_nccl_groups()
  • nemo_rl.models.megatron.nccl_offload.reload_megatron_nccl_groups(state_snapshot)

Until PR #4 lands or this branch is stacked on it, the RLIX_CONTROL_PLANE=rlix path has that dependency. Standalone NeMo RL behavior uses no-op hooks and should remain unchanged.

Validation

  • python -m py_compile nemo_rl/algorithms/async_utils.py nemo_rl/algorithms/grpo.py nemo_rl/algorithms/rlix_hooks.py
  • git diff --check

Could not run the RL pytest suite locally because this environment is missing ray.

TianyeGGBond and others added 4 commits April 24, 2026 20:35
Add rlix_hooks.py: RLixHooksProtocol (typing_extensions Protocol) +
NoOpRLixHooks default for standalone mode. Seam file keeps NeMo RL free
of direct rlix package imports.

Modify async_grpo_train:
- rlix_hooks parameter injected by NemoRLRLixHooks from pipeline actor
- DO_TIME_SHARING flag from RLIX_CONTROL_PLANE env var
- before_training(step): blocks on scheduler GPU grant before lp_inference
- after_training(step): notifies scheduler release; replaces refit in RLix mode
  (weight sync + version update done atomically in _expand_workers, F6)
- on_trajectory_collector_created: registers collector handle so _expand_workers
  can call set_weight_version before activating dp rank routing
- Initial refit and prepare_for_generation skipped when DO_TIME_SHARING=True

TODO placeholders in after_training branch:
  F4: policy.build_cpu_bucket_cache(step)
  F11: policy.offload_training_gpu() + policy.destroy_nccl_groups()

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant