[Serve] Consider custom resources in best-fit node selection for DeploymentScheduler in Ray Serve #51361
Labels
enhancement
Request for new feature and/or capability
serve
Ray Serve Related Issue
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
Description
Currently, the DeploymentScheduler in Ray Serve does not fully consider custom resources (e.g., resources: {GRAM: 6}) when selecting the best-fit node for deployment.
The function _best_fit_node() primarily evaluates CPU, GPU, and memory, but does not include custom resources defined in ray_actor_options["resources"].
However, custom resources are already considered when filtering available nodes.
This means that a deployment will not be scheduled onto a node without the required custom resources, but it may not be scheduled onto the most optimal node in terms of resource utilization.
Use case
These are the resource states after deployment.
ray_actor_options: num_cpus: 10.0 num_gpus: 1.0 memory: 8.0 resources: GRAM: 6
One replica went to Node D (which had exactly 6 GRAM available).
This means that Node C now has 6/12 GRAM remaining, leading to potential resource fragmentation.
export RAY_SERVE_CUSTOM_RESOURCES="GRAM"
Further Improvement Suggestion
Contribution Intent
Would this be a viable improvement for future Ray Serve releases?
The text was updated successfully, but these errors were encountered: