[Serve] Consider custom resources in best-fit node selection for DeploymentScheduler in Ray Serve #51361

ktyxx · 2025-03-14T04:48:45Z

Description

Currently, the DeploymentScheduler in Ray Serve does not fully consider custom resources (e.g., resources: {GRAM: 6}) when selecting the best-fit node for deployment.

The function _best_fit_node() primarily evaluates CPU, GPU, and memory, but does not include custom resources defined in ray_actor_options["resources"].
However, custom resources are already considered when filtering available nodes.
This means that a deployment will not be scheduled onto a node without the required custom resources, but it may not be scheduled onto the most optimal node in terms of resource utilization.

Use case

These are the resource states after deployment.

Deployment Configuration

ray_actor_options: num_cpus: 10.0 num_gpus: 1.0 memory: 8.0 resources: GRAM: 6

Two replicas of this deployment were scheduled.
Each replica requires 10 CPU, 1 GPU, 8GB memory, and 6 GRAM.

Observed Scheduling Behavior (Inefficient Allocation)

Expected: Both replicas should be deployed on Node A and Node D (since they have exactly 6/6 GRAM available).
Actual: One replica went to Node C (which had 12 GRAM total, but 6 available).
One replica went to Node D (which had exactly 6 GRAM available).
This means that Node C now has 6/12 GRAM remaining, leading to potential resource fragmentation.

Root Cause Analysis

Ray Serve's _best_fit_node() function currently considers CPU, GPU, and memory but does not prioritize exact fits for custom resources (resources: {GRAM: 6}).
The custom resource is correctly used for filtering nodes but is not included in the best-fit selection process.
As a result, the scheduler sometimes places deployments on nodes with excess custom resources instead of an exact match.

Suggested Fix

Modify _best_fit_node() to consider custom resources (resources) in best-fit selection in the same way CPU, GPU, and memory are handled.
Introduce an optional flag (RAY_SERVE_CUSTOM_RESOURCES) so users can enable or disable this stricter scheduling behavior.

import os

class Resources(dict):
    CUSTOM_RESOURCES = set(os.environ.get("RAY_SERVE_CUSTOM_RESOURCES", "").split(","))

    def get(self, key: str):
        val = super().get(key)
        if val is not None:
            return val

        if key in ["CPU", "GPU", "memory"] or key in self.CUSTOM_RESOURCES:
            return 0  

        return 0

Example Usage of RAY_SERVE_CUSTOM_RESOURCES

export RAY_SERVE_CUSTOM_RESOURCES="GRAM"

"This will make the scheduler consider GRAM as a factor in best-fit node selection, alongside CPU, GPU, and memory."

Further Improvement Suggestion

Additionally, instead of simply minimizing remaining resources, a weighted scoring method could be used.

The current _best_fit_node() function selects the node that minimizes leftover resources,
but it does not consider the balance between different resources.
For example, if Node A has 1 CPU, 2 Memory left and Node B has 2 CPU, 1 Memory left, both nodes are considered equal.
A weighted scoring method that considers the proportional fit (e.g., (remaining_resource / total_resource))
could improve scheduling efficiency, especially when deploying multiple replicas.
This would ensure that custom resources (like GRAM) are not just considered, but also optimally utilized.

Alternative Improvement: Hierarchical Resource Prioritization
In addition to weighted scoring, another potential enhancement is implementing hierarchical resource prioritization.

Currently, all resources are treated equally when selecting the best-fit node. Instead, allowing users to define a priority order
(e.g., [Custom Resource > GPU > CPU > Memory]) would ensure that deployments are scheduled onto nodes where the most critical resources
are most available first.

For instance, if a deployment primarily depends on GRAM, the scheduler would first attempt to place it on nodes with the most available GRAM,
before considering other factors like CPU or memory. This could prevent fragmentation of custom resources and improve overall resource utilization efficiency.

Would this be a reasonable improvement to introduce alongside the weighted scoring method?

Contribution Intent

I am willing to submit a PR for this enhancement and would appreciate feedback on the approach.
Would this be a viable improvement for future Ray Serve releases?

The text was updated successfully, but these errors were encountered:

ktyxx added enhancement Request for new feature and/or capability triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Mar 14, 2025

jcotant1 added the serve Ray Serve Related Issue label Mar 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Serve] Consider custom resources in best-fit node selection for DeploymentScheduler in Ray Serve #51361

[Serve] Consider custom resources in best-fit node selection for DeploymentScheduler in Ray Serve #51361

ktyxx commented Mar 14, 2025 •

edited

Loading

[Serve] Consider custom resources in best-fit node selection for DeploymentScheduler in Ray Serve #51361

[Serve] Consider custom resources in best-fit node selection for DeploymentScheduler in Ray Serve #51361

Comments

ktyxx commented Mar 14, 2025 • edited Loading

Description

Use case

ktyxx commented Mar 14, 2025 •

edited

Loading