[fix] Fix placement group bundle ordering for inference engines#1308
[fix] Fix placement group bundle ordering for inference engines#1308
Conversation
Ray placement groups don't guarantee bundle order - bundles on the same node may not have consecutive indices. Introduce SkyRLPlacementGroup wrapper that computes GPU-aware reordered bundle indices (sorted by node_id, gpu_id) as a cached property, and use them for all bundle indexing in both inference engine and trainer placement. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Update ServerGroup to accept SkyRLPlacementGroup consistently - Change VLLMServerActor to accept explicit bundle_indices list instead of computing contiguous range from start_bundle_idx - Fix type annotation in trainer.py for colocate_pg - Update tests to wrap raw PlacementGroups in SkyRLPlacementGroup Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ping All callers now wrap raw PlacementGroups in SkyRLPlacementGroup at creation time. PPORayActorGroup, create_ray_wrapped_inference_engines, and flash_rl_engine assert the type instead of silently auto-wrapping, which avoids redundant reordering computation when the same PG is shared across multiple actor groups. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…_offload Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a SkyRLPlacementGroup wrapper class to manage Ray placement groups, ensuring GPU-aware ordering of resource bundles. The change addresses the issue that Ray's native placement groups do not guarantee bundle order, which is crucial for consistent GPU resource allocation in distributed training and inference. The code updates replace direct PlacementGroup usage with SkyRLPlacementGroup across various modules, including inference engines, trainers, and server groups. This involves extracting the raw Ray placement group and its reordered bundle indices from the wrapper, and modifying scheduling strategies and bundle index assignments to utilize these reordered indices. Additionally, server actor initialization now accepts a list of bundle_indices instead of a starting index, providing more precise control over resource allocation.
Note: Security Review did not run due to the size of the PR.
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
|
@SumanthRH this pretty much lgtm, but let's merge #1300 first since there's some overlap/merge conflicts to resolve? |
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
…ndant GPU probing SkyRLPlacementGroup now stores the full (bundle_idx, node_id, gpu_id) mapping via lazy _get_bundle_placement(), exposing bundle_gpu_ids and bundle_node_ids properties. This eliminates get_gpu_ids_for_pg_bundles and get_reordered_bundle_indices (dead code), and simplifies mp backend GPU ID lookup in both old (ray_wrapped_inference_engine) and new (ServerGroup) stacks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| num_servers: int, | ||
| start_port: int = 8000, | ||
| placement_group: Optional[PlacementGroup] = None, | ||
| placement_group: Optional[SkyRLPlacementGroup] = None, |
There was a problem hiding this comment.
Can we call this OrderedPlacementGroup?
There was a problem hiding this comment.
Hm I don't think that name would be right given that the underlying pg object is still "unordered" and the wrapper class here is simply resolving the physical ordering of the bundles. OrderedPlacementGroup makes it sound like we have additional guarantees on the pg object
I've renamed to ResolvedPlacementGroup - this should be better.
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a ResolvedPlacementGroup wrapper to address incorrect bundle ordering in Ray placement groups, which is a critical fix for ensuring correct GPU-aware resource allocation. The new wrapper cleverly caches bundle placement information, improving both correctness and efficiency. The changes are consistently applied across the codebase, including inference engines, trainers, and tests. The overall implementation is robust and well-designed. I have one minor suggestion to improve code clarity in server_group.py.
| self._external_pg = placement_group | ||
|
|
||
| # Extract the raw PG, reordered indices, and GPU IDs from ResolvedPlacementGroup. | ||
| if placement_group is not None: | ||
| self._external_pg = placement_group.pg | ||
| self._reordered_bundle_indices = placement_group.reordered_bundle_indices | ||
| self._bundle_gpu_ids = placement_group.bundle_gpu_ids | ||
| else: | ||
| self._external_pg = None | ||
| self._reordered_bundle_indices = None | ||
| self._bundle_gpu_ids = None |
There was a problem hiding this comment.
This block can be simplified to improve clarity and remove redundancy. The assignment self._external_pg = placement_group on line 90 is immediately overwritten within the if block, and the else block repeats the default None assignment. Initializing these attributes to None before the if block would make the logic more straightforward.
| self._external_pg = placement_group | |
| # Extract the raw PG, reordered indices, and GPU IDs from ResolvedPlacementGroup. | |
| if placement_group is not None: | |
| self._external_pg = placement_group.pg | |
| self._reordered_bundle_indices = placement_group.reordered_bundle_indices | |
| self._bundle_gpu_ids = placement_group.bundle_gpu_ids | |
| else: | |
| self._external_pg = None | |
| self._reordered_bundle_indices = None | |
| self._bundle_gpu_ids = None | |
| self._external_pg = None | |
| self._reordered_bundle_indices = None | |
| self._bundle_gpu_ids = None | |
| if placement_group is not None: | |
| self._external_pg = placement_group.pg | |
| self._reordered_bundle_indices = placement_group.reordered_bundle_indices | |
| self._bundle_gpu_ids = placement_group.bundle_gpu_ids |
Fixes #1306
Ray placement groups don't guarantee bundle order - bundles on the same node may not have consecutive indices.
This PR introduces a
ResolvedPlacementGroupwrapper that computes GPU-aware reordered bundle indices (sorted by node_id, gpu_id) as a cached property, and use them for all bundle indexing in both inference engine and trainer placement.